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Preface 


After nearly half a century of developments in numer- 
ical methods, the field of computational mechanics has 
become sufficiently mature to collect the achievements and 
summarize the state-of-the-art in a comprehensive, author- 
itative major reference work. This idea, first conceived in 
1999, has resulted in the Encyclopedia of Computational 
Mechanics. It has been the intention of the editors and 
the publisher to provide the community with a systematic, 
well-organized survey of established as well as recently 
developed computational methods, covering applied and 
computational mathematics, computer science, the various 
branches of solid and fluid mechanics, and all the avail- 
able discretization methods. Attention has also been paid 
to many engineering and other applications. 

We have invited first-class scientists and engineers to join 
us in this challenging endeavor. It is our pleasure to thank 
all our contributors warmly for their excellent chapters and 
for completing them in a tight time frame. 

A major reference work like the Encyclopedia of Com- 
putational Mechanics should provide trustworthy facts and 
information. We hope that the reviewing process to which 
each chapter has been submitted will guarantee the high 
quality we have aimed for. We would like to acknowledge 
the careful and constructive work of all of our reviewers. 

The Encyclopedia of Computational Mechanics is orga- 
nized in three volumes with the subjects ‘Fundamentals’ 
(Volume 1), ‘Solids and Structures’ (Volume 2), and ‘Flu- 
ids’ (Volume 3), and is published both in print and on line. 

Volume 1, Fundamentals, contains contributions related 
to mathematics, mechanics, and computer science, and is 
structured as discretization methods (fourteen chapters), 
treating approximations with finite differences, discrete 
variational forms, boundary integral equations and further 
problem-oriented techniques, and the generation and visu- 
alization of geometry; meshes and results (three chapters); 
various direct and iterative solvers (five chapters); and time- 
dependent problems (three chapters). 

Volume 2, Solids and Structures, is organized into five 
different parts, namely, structural behavior (four chapters); 


constitutive theories and their discretization via finite 
element or boundary element methods (seven chapters); 
materials and processing (five chapters); interaction 
problems (five chapters); and identification, stochastics, and 
optimization (two chapters). 

Volume 3, Fluids, builds on the fundamentals described 
in Volume 1. The chapters in Volume 3 fall within four 
main groupings. The first (four chapters) includes chapters 
describing additional basic methodologies used in compu- 
tational fluid dynamics. The second (seven chapters) com- 
prises chapters on various aspects of incompressible viscous 
flows. The third (four chapters) focuses on compressible 
fluid dynamics. The fourth (two chapters) pertains to prob- 
lems involving moving domains and free surfaces. 

Returning to the question ‘Why the Encyclopedia of 
Computational Mechanics now?’, we believe that the field 
of computational mechanics has now reached a high degree 
of maturity and satisfies high standards of reliability and 
efficiency. After about three periods of development, start- 
ing from engineering-oriented methods through mathemat- 
ical foundations with error analysis and various generaliza- 
tions to adaptive multiscale and multiphysics simulations of 
complex micro- and macroprocesses, including for example 
models for climate and tectonic movements, computational 
mechanics nowadays is a basic and important subject for 
teaching and research, and has a multitude of applications. 

Last, but definitely not least, the editors would like to 
thank the team at John Wiley & Sons for their enthusiastic 
belief in this project, their professionalism, and for creating 
a friendly atmosphere at the several editorial meetings. 

The editors sincerely hope that this encyclopedia will be 
well accepted by the community and hope that the on-line 
version with its special features will be used extensively. Of 
course, we shall be grateful for corrections and proposals 
for improvements. 


Erwin Stein, René de Borst and Thomas J. R. Hughes 
Hannover, Delft, Austin 
September, 2004 
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Fundamentals: Introduction and Survey 


Erwin Stein 


University of Hannover, Hannover, Germany 


1 Motivation and Scope 1 
2 Stages of Development and Features of 
Computational Mechanics 1 
3 Survey of the Chapters of Volume 1 2 
4 What We Do Expect 6 


1 MOTIVATION AND SCOPE 


In the ‘Encyclopedia of Computational Mechanics’ (ECM), 
Volume 1 ‘Fundamentals’ includes 26 chapters. It con- 
tains the basic methodological, analytical, algorithmic, and 
implementation topics of computational mechanics. 

The main goals of the ECM are to provide first-class 
up-to-date representations of all major computer-oriented 
numerical methods and related special features for mechan- 
ical problems in space and time, their a priori and a posteri- 
oti error analysis as well as various convergent and efficient 
self-controlling adaptive discretization strategies and to fur- 
ther provide the wide range of robust and efficient direct and 
iterative solvers as well as challenging applications in all 
relevant technological areas. Geometrical representations of 
technical objects, mesh generations, and mesh adaptivity as 
well as the visualization of input- and output data are also 
important topics. 

The now already ‘classical’ discretization methods using 
finite differences, finite elements, finite volumes, and 
boundary elements were generalized with new conceptual 
ideas into various directions in the last decade, such as 
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meshfree methods, spectral, and wavelet techniques as well 
as discrete finite element algorithms, which are presented 
here. Error analysis and adaptivity are essential features 
in general. 


2 STAGES OF DEVELOPMENT AND 
FEATURES OF COMPUTATIONAL 
MECHANICS 


One can say that we are now in about the third period 
of development of computer-based numerical methods, 
especially based on weighted residuals, such as the finite 
element method (FEM) and its generalizations as well as 
the boundary integral equation method (BIEM or simply 
BEM) and various couplings of both. The finite difference 
method (FDM) further plays an important role, especially 
for time integrations. 

The first period was from about 1960 to 1975, with 
a lot of separate engineering approximations for specific 
mathematical models, especially FEMs for linear elastic 
static systems, like beams and plates with plane stress 
states and bending, as well as eigenvalue analysis of sta- 
bility and vibration problems with applications to struc- 
tural mechanics. Parallel to this, FEMs for aero- and 
hydrostatic problems and also for hydrodynamic processes 
were developed. 

The second period from about 1976 to 1990 was charac- 
terized by rigorous mathematical analysis of Ritz—Galerkin- 
type discretization methods with trial and test functions in 
Sobolev spaces within finite subdomains, in order to ana- 
lyze elliptic boundary-value problems, together with the a 
priori and a posteriori error analysis of the FEM and the 
BEM in their various forms for large classes of problems, 
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such as boundary-value problems of symmetric, positive 
definite elliptic operators of second order, and also for 
parabolic and hyperbolic operators with operator splits in 
space and time, solving systems of ordinary differential 
equations in time by a finite difference method. Parallel 
to this, sophisticated engineering developments took place 
toward complicated linear and nonlinear problems of the 
classical partial differential equations (PDEs) in mathemat- 
ical physics with large dimensions of the algebraic equa- 
tions, motivated and driven by the fast growth of computer 
power and memory as well as the availability of efficient 
software systems and, of course, by technological needs and 
motivations. Numerous chapters in Volumes 2 and 3 show 
the wide range of challenging applications of FEM, BEM, 
and various other problem-oriented discretization methods. 
These new integral methods of weighted residuals are char- 
acterized by two important properties: 


(a) Methodical width: This is the intended simple log- 
ical and algorithmic structure (e.g. with symme- 
try properties and well-posedness in regular cases) 
and the possibility of extensions and generalizations 
within a large class of similar problems, including 
higher dimensions, and thus forming the frame for a 
box-type program structure. This is mainly achieved 
by operations within the finite element subdomains 
only, that is, without needing neighborhood infor- 
mation on element level, and thus allowing unified 
assembling and solution procedures of the global 
systems of algebraic equations. The methods yield 
relatively small condition numbers of the algebraic 
equation systems and thus provide robust solutions in 
regular cases. 

(6) Methodical depth: This means the rather simple 
extension of methods, algorithms, and computer pro- 
grams to more complicated — especially geometrically 
and physically nonlinear — problems and to physically 
coupled problems, This also holds for the implemen- 
tation of sensitivity analysis within the solution of 
optimization problems. 


These two properties (a) and (b) are the reasons for the 
tremendous development in and the flexible availability of 
related program systems for applications in science and 
technology. 

In the third period, from 1991 until now, new tasks, 
challenges, and research directions can be observed in com- 
putational mechanics (more general in applied physics) and 
in computational mathematics that can be summarized as 
follows: 


~ Meshfree and particle methods, finite elements with 
discontinuities for damage and fracture. 


- Error-controlled adaptive modeling and approximation 
of physical events near to nature, also scale-bridging 
modeling on different space and timescales, including 
homogenizations between them. 

— Adaptive micromechanical modeling and computation 
in material science and engineering, including damage, 
phase changes, and various failure processes. 

~ New types of generalized FEM and BEM with hierar- 
chical, spectral, and wavelet-based interpolations. 

~ Modeling and simulation of multiphysics phenomena 
in science and engineering. 

— Complex models and simulations in biomechanics and 
human medicine. 

— New generalized methods for geometrical modeling, 
mesh generation, and mesh adaptivity. 

— New direct and iterative solvers with multilevel and 
domain decomposition methods. 

- Advanced visualization of objects, processes, and 
numerical results, 3D-animation of virtual reality. 


With the advanced current hardware and software tools 
on hand, about 10 million unknowns of a complex prob- 
lem can be computed today in reasonable time, using 
problem-oriented iterative algebraic solvers and precon- 
ditioners with advanced data management for high-end 
machines with parallel or serial scalar and vector proces- 
sors. Personal computers also enable us to solve hundreds 
of thousands of unknowns together with error estimation 
and adaptive mesh refinements. With these tools, it has 
become possible to realize the verification and even the 
restricted validation of engineering systems and processes, 
taking into account disturbed input data and determinis- 
tic or statistic imperfections of structures and materials. 
This leads us to new paradigms in computational mechan- 
ics, namely, guaranteeing reliability, safety, and efficiency 
of results very near to the physical reality of the investi- 
gated objects. And because of this progress, computational 
mechanics helps to simulate virtual products and processes 
without the necessity of many physical experiments and 
thus reduces costs and development time of new products 
considerably. 


3 SURVEY OF THE CHAPTERS OF 
VOLUME 1 


Volume 1 can be classified into the four groups: discretiza- 
tion methods (Chapter 2 to Chapter 15 of this Volume 
(14 chapters)); geometrical modeling, mesh generation, and 
visualization (Chapter 16 to Chapter 18 of this Vol- 
ume (3 chapters)); Solvers (Chapter 19 to Chapter 23 of 
this Volume (5 chapters)); and time-dependent problems 
(Chapter 24 to Chapter 26 of this Volume (3 chapters)). 


ET 


The first group, discretization methods, begins with 
Finite difference methods by Owe Axelsson in which 
elliptic, parabolic, and hyperbolic problems of second and 
fourth order as well as convection—diffusion problems are 
treated in a systematic way, including error analysis and 
adaptivity, emphasizing computational issues. 

Next, FEMs are presented in six chapters, beginning 
with Interpolation in finite element spaces by Thomas 
Apel, with a survey of different types of test and trial 
functions, investigating the interpolation error as a basis for 
a priori and a posteriori error estimates of finite element 
methods. For a priori estimates, nodal interpolants are 
used as well as the maximum available regularity of the 
solution to get optimal error bounds. A posteriori error 
estimates of the residual type need local interpolation 
error representations for functions from the Sobolev space 
W!2(Q). Different interpolation operators and related error 
estimates are presented for the h-version of the usually used 
2D- and 3D-finite elements. 

The following chapter, Finite element methods, by 
Susanne Bremner and Carsten Carstensen, treats the dis- 
placement method (primal finite element method) for 
boundary-value problems of second-order elliptic PDE’s 
as well as a priori and a posteriori error estimates of 
the weak solutions and related h-adaptivity, including non- 
conforming elements and algorithmic aspects. This basic 
chapter is followed by The p-version of the finite element 
method for elliptic problems, by Barna Szabó, Alexan- 
der Diister, and Ernst Rank, in which hierarchical shape 
functions of order p are used as test and trial interpola- 
tions of the finite elements instead of nodal basis functions. 
Exponential convergence rates in conjunction with suffi- 
cient h-refinement in subdomains with large gradients of 
the solution are advantageous against the h-version. Bound- 
ary layers of dimensionally reduced models (by appropriate 
kinematic hypotheses), which need the solutions of the 
expanded mathematical model, can be represented in a 
consistent way by using adequate p-orders, see also Chap- 
ter 8, this Volume, by Monique Dauge ef al. The arising 
problems are: (i) the fast integration of fully populated 
element stiffness matrices, (ii) relatively large algebraic 
systems with strongly populated global stiffness matrices, 
(iii) the problem of geometric boundary representations 
without producing artificial singularities, (iv) hp-adaptivity 
for 3D-systems as well as (v) the efficient implementation 
of anisotropic p-extensions that are efficient for geometri- 
cally and physically anisotropic problems like thin plates 
and shells, for example, with anisotropic layers of com- 
Posites. All these problems have been tackled successfully 
Such that the p-type finite element method — in connection 
with some available computer programs — has reached the 
necessary maturity for engineering practice. 
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In Chapter 6 to Chapter 8 of this Volume, problem- 
oriented effective test and trial spaces are introduced for 
BVPs of PDEs. 

Chapter 6, this Volume, by Claudio Canuto and Alfio 
Quarteroni, is devoted to the high-order trigonometric and 
orthogonal Jacobi polynomial expansions to be applied to 
generalized Galerkin methods for periodic and nonperiodic 
problems, with numerical integration via Gaussian integra- 
tion points in order to achieve high rates of convergence 
in total. 

Chapter 7, this Volume, by Albert Cohen, Wolfgang 
Dahmen, and Ronald De Vore, represents matrix compres- 
sion methods for the BIEM based on wavelet coordinates 
with application to time-dependent and stationary problems. 
Wavelets also yield sparsity for the conserved variables of 
problems with hyperbolic conservation laws. In addition, a 


new adaptive algorithm is derived for sparse functions and 


operators of linear and nonlinear problems. 

Chapter 8, this Volume, by Monique Dauge, Erwan 
Faou, and Zohar Yosibash, treats known and new meth- 
ods for consistent reductions of the 3-D theory of elasticity 
to 2-D theories of thin-walled plates and shells by expan- 
sions with respect to small parameters, without applying 
the traditional kinematic and static hypotheses. A poly- 
nomial representation of the displacements is presumed, 
depending on the thickness direction, generating singularly 
perturbed boundary layers in the zero thickness limit. This 
favors (hierarchical) p-extensions in the thickness direction, 
yielding hierarchical plate and shell models. Finite element 
computations show convergence properties and the effi- 
ciency of this important problem of boundary layer analysis 
of plates and shells, 

Chapter 9 to Chapter 11 of this Volume treat Gener- 
alized finite element methods. Chapter 9, this Volume, 
by Ferdinando Auricchio, Franca Brezzi, and Carlo Lovad- 
ina, gives a systematic survey and some new results on 
the stability of saddle-point problems in finite dimensions 
for some classical mechanical problems, like thermal dif- 
fusion, the Stokes equations, and the Lamé equations. 
Mixed methods yield the mathematical basis for problems 
with locking and other numerical instability phenomena, 
like nearly incompressible elastic materials, the Reissner 
and Mindlin plate equations, and the Helmholtz equation. 
From an engineering point of view, reduced integration 
schemes and stabilization techniques get a sound founda- 
tion by the problem-dependent inf—sup condition. Chap- 
ter 10, this Volume, by Antonio Huerta, Ted Belytschko, 
Sonia Ferndndez-Méndez, and Timon Rabczuk, provides 
an advanced and systematic representation of different ver- 
sions and alternatives of the so-called meshfree and particle 
methods, known as moving least squares, partition of unity 
FEM, corrected gradient methods, particle-in-cell methods 
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and so on. The method was originally invented for moving 
singularities, and discontinuities like crack propagation in 
solids, in order to avoid frequent complicated and costly 
remeshings. These methods are based on Ritz—Galerkin- 
and Petrov—Galerkin-type weighted residua or collocation 
concepts and generalizations, such as Lagrange multiplier 
and penalty methods. Radial basis functions are a good tool 
(without having a compact support) as well as hierarchical 
enrichments of particles. 

The error-controlled approximation of the essential 
boundary conditions of a boundary-value problem and, of 
course, the related a priori and a posteriori error analy- 
sis as well as the relatively large condition number of the 
algebraic systems combined with big computational effort 
are crucial points. 

In generally speaking, meshfree methods are now supe- 
tior or at least equivalent to classical FEMs for some of the 
addressed specific types of problems. 

The last of the three chapters dealing with general- 
ized FEMs, is Chapter 11, this Volume, by Nenced J.N. 
Bicanic. Instead of constructing a convergent and stable 
numerical method for the approximated solution of, for 
example, a boundary-value problem for a continuous dif- 
ferential operator, a direct computational simulation of an a 
priori discrete system with embedded discontinuous defor- 
mations, cracks, fragmentations, and so on is treated here. 
This also includes assemblies of particles of different shapes 
with their various contact problems, compactions, and other 
scenarios of real processes. This is a rather new area of 
computational mechanics, so far mostly treated on an engi- 
neering level, that is, without mathematical analysis. The 
question arises how ‘convergence’ and ‘numerical stabil- 
ity’ can be defined and analyzed herein. But there is no 
doubt that this type of direct computational simulation 
of technological problems will play an important role in 
the future. 

The two chapters that follow are devoted to Boundary 
element methods and their coupling with finite element 
methods. Chapter 12, this Volume, by George C. Hsiao 
and Wolfgang L. Wendland, represents variationally based 
Galerkin-BEMs for elliptic boundary-value problems of 
second order in a mathematically rigorous way, classified 
by the Sobolev index. Various boundary integral equations 
can be derived, introducing fundamental solutions, Green’s 

representation formula, Cauchy data, and four boundary 
integral operators. Out of this reservoir, several numerical 
methods and algorithms for boundary elements are pre- 
sented and discussed. The main features such as stability, 
consistency, and convergence as well as adequate solvers, 
condition numbers, and efficiency aspects are well treated. 
Of course, error analysis and adaptivity play an impor- 
tant role. BEM has advantages over FEM in the case of 


complicated boundaries, for example, mechanical problems 
with edge notches and regular inner domains, and with 
respect to dimensional reduction by one. Efficient recur- 
sive integration formulas and solvers for the fully populated 
system matrices are available. 

Chapter 13, this Volume, by Ernst Stephan, treats the 
obvious variant of combining the different strengths of 
both methods by symmetric couplings, which, of course, 
need considerable algorithmic efforts and adequate solvers 
with problem-dependent preconditioners. Special features 
are Signorini-type contact problems using both primal and 
dual-mixed finite element approximations. Recent features 
are adaptive hp-methods. There seems to be a lack of 
available software for 2D--— and even more for 3D- —- 
problems. 

Chapter 14, this Volume, by J. Donea et al., is to be 
seen separate from the previous presentations of differ- 
ent variational discretization methods, as it treats various 
coupled processes — for example, fluid-solid interaction — 
by suitable coordinates and metrics for each of the con- 
stituents — for example, Lagrangian coordinates for solids 
and Eulerian coordinates for fluids ~ using the well-known 
tangential push-forward and pull-back mappings between 
the two descriptions via the deformation gradient. The 
profit of computational efficiency and robustness can be 
significant, for example, for the rolling contact of a tire 
on a street. The ALE-concept, its analysis, the algorithms, 
and important applications for linear and especially nonlin- 
ear static/dynamic problems in solid and fluid mechanics 
are systematically presented and illustrated by adequate 
examples. Also, smoothing and adaptive techniques for the 
finite element meshes are discussed. It is remarkable how 
quickly the ALE concept was implemented in commer- 
cial programs. 

Chapter 15, this Volume, by Timothy Barth and Mario 
Ohiberger, also stands separate from the scheme of finite 
domain and boundary element approximations. Finite vol- 
ume elements were invented in fluid mechanics and are also 
applied now in other branches like biology, and in solid 
mechanics, too. The advantage of finite volume approxi- 
mations in comparison with the usual finite element dis- 
cretizations in finite subdomains is the intrinsic fulfillment 
of local conservation properties, like mass conservation or 
entropy growth. Finite volume elements usually also yield 
robust algebraic systems for unstructured meshes; they are 
especially favorable for nonlinear hyperbolic conservation 
systems in which the gradients of the solution functions 
can blow up in time. Integral conservation laws and dis- 
crete volume methods are applied using various meshing 
techniques of cell- and vertex-centered control volumes. A 

priori and a posteriori error estimates are presented and 


applied for adaptivity. Also, solvers in space and time 
are discussed. 

Chapter 17 to Chapter 18 of this Volume are devoted 
to the Computer representation and visualization of 
topology, geometry, meshes, and computed data. 

Chapter 16, this Volume, by Franz-Erich Wolter, Niklas 
Peinecke, and Martin Reuter, treats a subject growing in 
importance in computational mechanics as technical objects 
and their physical substructures become more and more 
complicated. The presented methods and realizations can 
be classified as computational methods for topology and 
geometry with volume- and boundary-based (direct and 
indirect) representations of objects and also with a rather 
new type of modeling, using medial axes and surfaces for 
describing objects that are mainly one or two dimensional 
in their appearance. This medial modeling allows a natural 
transition to finite element meshes. Of course, additional 
attributes can be included in the geometry, like photometric 
or toughness properties. 

In Chapter 17, this Volume, by P.L. George et al., the 
techniques of planar, surface, and volume meshing as well 
as adaptive global and local remeshing for discretization 
methods are outlined, aiming at automatic self-controlled 
algorithms for various types of mesh generations and 
their visualizations. Hard problems arise with large 3D- 
meshes, requiring spatial decomposition, as well as related 
automatic remeshing and local mesh adaptivity. Moving 
boundaries and the meshing errors of specific elements 
with respect to the given analytic or free-form geometry 
are crucial problems. 

The main methods for structured and unstructured mesh 
generations are presented, where unstructured meshes are 
constructed in a purely algebraic way or are based on appro- 
priate PDE-solutions. Hierarchical spatial decompositions 
are used for arbitrary shaped domains with unstructured 
meshes, like the quadtree and octree method, advancing 
front strategies and Delaunay type methods. 

Chapter 18, this Volume, by William J. Schroeder and 
Mark S. Shephard, also treats a crucial subject in modern 
science and technology. Selected interactive visualization 
of real and virtual objects and processes conveys intrin- 
sic conceiving of the essentials that is hardly possible 
with data files only. Visualization concerns geometry with 
attribute data, meshes, and results that may need scalar, 
vector, and tensor graphics with special features like pro- 
ducing streams of particles or physical quantities through 
a total system or through a control volume. The presented 
visualization algorithms give information about what is pos- 
sible today. 

Chapter 19 to Chapter 23 of this Volume treat the 
crucial problem of stable robust and efficient solvers for 
the various discrete algebraic systems introduced in the 
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first 14 chapters. Regarding the mostly high dimensions 
of algebraic equation systems, which are needed today 
in science and engineering and which can be solved 
now in reasonable execution time with the teraflop gen- 
eration of computers, a variety of sophisticated and 
problem-oriented types of direct and iterative solvers 
with adapted preconditioners were developed and are pre- 
sented here. 

Chapter 19, this Volume, by Henk A. van der Vorst, 
presents direct elimination and iterative solution meth- 
ods where the latter usually needs efficient precondition- 
ing operators for efficient solutions. All important itera- 
tive solvers — based on Krylov projections — are treated, 
like Conjugate Gradients, MINRES, OMR, Bi-CGSTAB, 
and GMRES. , 

Special eigenvalue problems of a Hermitian matrix are 
analyzed mainly with iterative QR-methods, emphasizing 
tridiagonal and (upper) Hessenberg matrices. The Krylov 
subspace approach is an efficient strategy that is applied 
with four different versions. Several preconditioners are 
presented and discussed. 

In Chapter 20, this Volume, by Wolfgang Hackbusch, 
fast iterative solvers for discretized linear and nonlinear 
elliptic problems are treated, yielding (optimal) linear 
complexity of the computational effort in regular cases, 
which, of course, is of dominant importance to systems 
with millions of unknowns for which multiprocessor and 
massively parallel computers are also efficient. 

Owing to the smoothing property of the Jacobi or 
Gauss-Seidel iteration for elliptic PDEs on fine grids, 
only data of coarse grid points are prolongated after 
some smoothing steps, and this is repeated through several 
grids, for example, in a W-cycle with four to five grid 
levels, such that the solution takes place only with a 
small equation system. The backward computation with 
restriction operators and further smoothing steps finishes 
an iteration cycle. It is efficient to produce a hierarchy of 
discretizations that is easy for regular and nested grids. Such 
a hierarchy may also be a side-product of adaptive mesh 
refinements. CAs: 

Major issues of the chapter are the complete algorithmic 
boxes as well as the analysis and error analysis of multigrid 
methods for FEM and BEM. 

Chapter 21, this Volume, by Wolfgang Hackbusch, 
presents efficient solvers for fully populated matrices as 
they arise in the boundary integral equation method. 
The goal is the reduction of O(n?) arithmetic opera- 
tions for standard matrix-vector multiplications to nearly 
O(n) operations. The essential parts of the algorithm are 
the far-field expansion and the panel cluster tree. A general- 
ized variant is the construction of hierarchical matrices (H- 
matrices) with different matrix—vector and matrix~matrix 
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operations only. The computation of inverse stiffness 
matrices in FEM (e.g. for multiload cases) can be efficiently 
computed by cluster techniques. 

Chapter 22, this Volume, by V.G. Korneev and Ulrich 
Langer, treats effective iterative solvers for large algebraic 
systems by the alternating Schwarz method and advanced 
substructuring techniques, emphasizing efficient problem- 
dependent preconditioners, Nonoverlapping Schwarz meth- 
ods are favored against overlapping methods, which need 
more effort for computer implementation and the control of 
the solution process. Of course, multiprocessor computers 
are most suitable for parallel solution within the decom- 
posed domains. 

In Chapter 23, this Volume, by Werner C. Rheinboldt, 
Strategies for efficient solvers of highly nonlinear algebraic 
systems, especially with physical instabilities like bifurca- 
tion and turning points, are investigated. These solvers are 
based on Newton’s iterative method, its variants, and inex- 
act Newton methods. 

The problem of solution instabilities, which depend on 
one scalar parameter, is analyzed by homotopy methods 
and continuation methods, Also, the bifurcation behavior 
of parameterized systems is investigated. 

The last three chapters of Volume 1, Chapter 24, Chap- 
ter 25, and Chapter 26 of this Volume, are devoted to the 
fundamentals of numerical methods for Time-dependent 
problems. 

Chapter 24, this Volume, by Kenneth Eriksson, Claes 
Johnson, and Anders Logg, is based on duality techniques, 
by solving associated linearized dual problems for the a 
posteriori error analysis and adaptivity, using the residuals 
of Galerkin approximations with shape functions, continu- 
ous in space and discontinuous in time, yielding the crucial 
stability factors and discretization error estimates in ade- 
quate norms. 

Parabolic initial boundary value problems are usually 
stiff, and the main problem is the control of error accu- 
mulation in time. This is treated successfully with implicit 
and explicit time-stepping methods for some classical 
parabolic equations, like the instationary heat equation and 
the reaction-diffusion problem. 

In Chapter 2, Volume 2, by Martin Costabel, parabolic 
and hyperbolic initial boundary value problems with tran- 
sient solutions in time are considered, like heat con- 


duction, diffusion, acoustic Scattering, and elastic waves. 
The following three approaches are critically compared; 
Space-time integral equations, Laplace-transform methods, 
and time-stepping methods; many advanced mathematical 
tools are necessary for the analysis, especially the error 
analysis, which is treated here in a systematic way and 
illustrated by examples. 

Chapter 26, this Volume, by Leszek Demkowicz, 
treats finite element approximations of the time-harmonic 
Maxwell equations. With the stabilized variational for- 
mulations and Nedelec’s three fundamental elements, hp- 
discretizations and hp-adaptivity are presented. Tetrahedral 
elements of the first and second type, hexahedral elements 
of the first type, prismatic elements as well as parametric 
elements are treated, 

The three Nedelec elements deal with the exact sequence 
of gradient, curl, and divergence operators, yielding the 
null-space, in addition to Ptojection-based interpolations of 
finite elements such that the element shape functions are 
defined as a dual basis to the d.o.f.-functionals, aiming at 
locality, global continuity, and optimality, and lastly the de 
Rham commuting diagram Property of the analytical and the 
finite dimensional solution Spaces, applying the gradient, 
curl, and divergence operators, 


4 WHAT WE DO EXPECT 


We are convinced that the above sketched 26 chapters of 
Volume 1 are a sound basis for today’s and tomorrow’s 
computational mechanics, integrating mathematics, com- 
puter science, and physics, especially mechanics, as well as 
challenging industrial applications that need high computer 
power, Computer implementations will only be competitive 
in engineering praxis if they are robust, stable, and efficient, 
also concerning the requested logical clearness, as well as 
the width and depth of the algorithms, as explained above. 

An important benefit of this encyclopedia is seen in the 
combination of Volume 1 with Volumes 2 and 3, which 
are devoted to computational solid and fluid mechanics 
such that users interested in theoretical issues and those 
interested in practical issues can both get the information 
they want, together with any secondary or background 
knowledge. 
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1 INTRODUCTION 


Although, in general, more restricted in their applicabil- 
ity, finite difference methods provide a simple and readily 
formulated approximation framework for various types of 
partial differential equations. They can, hence, often com- 
pete with other methods such as finite element methods, 
which are based on certain variational formulations of the 
differential equations, and where the preparation work to 
construct the corresponding systems of algebraic equations 
is more involved. 

When constructing the approximation scheme, it is 
important to know the type of the given differential 
equation, Partial differential equations of second order 
are classified according to the type of their principal 
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part JO; j1 4 (X)(87u/ax, ðx;). With no limitation, we can 
assume that the coefficient matrix Ax) = la; Wk j=l is 
symmetric. 

As is well known, the differential equation is elliptic in 
x if all eigenvalues of A have the same sign, hyperbolic 
if one eigenvalue has an opposite sign to the others, and 
Parabolic if one eigenvalue is zero and the remaining have 
the same sign. 

We do not consider other cases here. Note that a dif- 
ferential equation can be of different types in differ- 
ent parts of the domain (Q) of definition. For these 
classes of differential equations, we need different types 
of boundary conditions in order for the problem to be well 
posed. 


Definition 1. A boundary value problem is well posed if 


(i) Existence: There exists a solution that satisfies the 
equation and each given boundary condition (assum- 
ing they are sufficiently smooth). 

(ii) Uniqueness: It has at most one solution. 

(iii) Stability: Its solution is a continuous function of the 
given data, that is, small changes in the given data 
entail small changes in the solution. 


Although it is of interest in practice to consider also 
certain ill-posed problems, these will not be dealt with here. 

Solutions of partial differential equations of different 
types have different properties. For instance, the solution to 
elliptic and parabolic problems at any given point depends 
on the solution at all other points in Q. However, for 
hyperbolic problems, the domain of dependence is a subset 
of Q. Although some problems may belong to the class of 
elliptic problems, they may exhibit a dominating hyperbolic 
nature in most parts of 2. To illustrate the aforesaid, 
consider the following examples. 
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The problem 
Lu = —£Au +v: Vu =0inQ, u=gondQ (1) 


where |v] >> £, £ > 0 has a dominating hyperbolic nature in 
the interior domain, except near the outflow boundary part, 
where v - n > 0, and where the diffusion part dominates and 
the solution has a thin layer. Here n is the outward-pointing 
normal vector to 2. 

For the Poisson problem 


—Au =p), xeR 


where we ignore the influence of boundary data, the solu- 
tion is 


1 
us) = = [ ix — [71 p(&) ak Q) 


called the gravitational or electric potential, where p is the 
density of mass charge. Hence, the solution at any point in 
Q depends on all data. On the other hand, the first-order 
hyperbolic problem 


au, + bu, = 0, x>0, y>0 (3) 


where a, b are positive constants, and with boundary condi- 
tions u(0, y) = g(y), y > O and u(x, 0) = A(x), x > Ohas 
a solution 


Therefore, the solution is constant along the lines bx — 
ay = const, called characteristic lines, so the solution at 
any point depends just on a single-boundary data. 

Depending on its type, there exist various methods to 
prove uniqueness and stability of a boundary value prob- 
lem. For elliptic and parabolic problems, one can use a 
maximum principle, which shows pointwise stability. As 
an example, consider — Au = f in Q, u = g on 082, where 
f < 0. Then, assuming first the contrary, it is readily proven 
that the maximum of u is taken on the boundary 092. This 
also shows that any perturbation of boundary data by some 
amount £ cannot result in any larger sized perturbation of 
the solution in the interior of the domain. An alternative 
way of proving the maximum principle is via a Green’s 
function (fundamental solution) representation of the solu- 
tion such as in (2). 

For hyperbolic problems, one uses the energy integral 
method, Let u) and u, be two solutions of u, = au,, 
a>0,0<x <1,t>0, where u(0,t) = 0, which corre- 
spond to different initial data. Then, v = u; — u, satisfies 


the corresponding homogeneous equation. By multiplying 
the equation by v and integrating, we get 


1 1 
f yude + f av,vde =0 
0 0 


or, by letting E(t) = fj v? dx (called the energy integral), 
we find 


i ko) 4 PE ğ=0, 1>0 

that is, E'(t) <0. Hence, E(t) < E(0), t >0, where 
E0) = fo (u(x, 0) — ua (x, 0))? dx, which shows stability 
and uniqueness. 

Clearly, the energy method is also applicable for more 
general problems, where the analytical solution is not 
known. For the discretized problems, one can use a discrete 
maximum principle, enabling error estimates in maximum 
norm. For parabolic and hyperbolic problems, we can also 
use stability estimates based on eigenvalues of the corre- 
sponding difference matrix or, more generally, based on 
energy type estimates. Finally, for constant coefficient prob- 
lems, Fourier methods can be used. 

Let the domain of definition of the continuous problem 
be discretized by a rectangular mesh with a mesh size h. In 
order to compute a numerical solution of a partial differen- 
tial equation like (1), we replace the partial derivatives in 
the differential equation by finite differences. In this way, 
we obtain a discrete operator equation 


Lyn = fa 


We will be concerned with the well-posedness of the 
discrete equation, using a discrete maximum principle, for 
instance, as well as with estimates of the discretization 
error e, =U —u,, where u is the solution of the original 
continuous partial differential equation. This estimation will 
be based on the truncation error T, = Lyu — fr 

The following finite difference approximations will be 
used throughout this text: 


w'a) = Df u(x) = Flue +h) = ua), 

the forward difference (4) 
wa) & Deu) = Flu) — ule — hy) 

the backward difference (5) 
ul(x) = D®u(z) := lute +h) —u(x—h)], 


the first-order central difference (6) 


Note that Du = (1/2)(D} + D7 )u. More generally, 
one can use the so-called “6-method” 


ul (x) = 0D} + (1 — 9 D7 lu) (7) 


where @ is a method parameter. Thus, 
h H 
u(x) = [ODF + (1 — 9) D7 lu(x) + 5 — 20)u"(x) 
h? 3 
s gia — ju (x — nyh) + UP œ + nh) 


where 0 < n; < 1,i = 1,2, so 


u(x) = [opt + (1 -9 D7 Jue) 
O(n) if 1 — 20| = Oh) 
Oth) otherwise 


if 0 < 6 < 1, we get 


u(x) = “outs +h) +- 20)u@x)- (1 - 0)u(x — h)] 
h n k? (3) 
py eee o- Gu (x+734), —1 <j <1 


Note that for 9 = 1/2, we get (6),-for 9 = 1, we get (4), 
and for 9 = 0, we get (5). Hence, the 6-method generalizes 
the previous methods. s 
An approximation for the second derivative is obtained 
as 
u" (x) = D} Dr u(x) = D}pu (x), 
the central difference of second order (8) 


We have then D®,u(x) = D} Dļu(x) = h-[u(x +h) — 
2u(x) + u(x — h)], thus 


h2 
u" (x) = DL u(x) — gee +h), ~1<n <1 
or 
u(x) = D? u) + OR), h> 0 


Similar expressions hold for DY}, Dy, D8, and so on. In 
particular, if u®, u® € C, 
Dy Dz u(x, y) + Dý Dy ut, y) 
= hz lul +h, y) HUE — hys Y) — Ue, Y) 
5 ee 
+A lu, y + hy) tue, y — hy) = 2u, y) 


= Uy, (,Y) + Uyy y) + OH) + OCS), Ay hy > 0 
(9) 
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For h, = h, = h, we have 


Au = [D} D] + D} Dy u(x, y) = h° lu +h, y) 
+u(x — h, y) bux, y +h) +u, y — h) ~ 4ulx, y) 


A©®) is called the 5-point difference operator. 

Various difference methods adjusted to the type of the 
problem, with discretization error estimates based on trun- 
cation errors, are presented. 

When deriving truncation error estimates for difference 
approximations, one uses Taylor expansion (assuming suf- 
ficient regularity of u) 


1 
u(x; +h) = ux) + hula) + + EH 


x EME De) + RG; (10) 


where the remainder term R(x;, h,k) can be written as 
R(x;, h,k) = Q/kD hu (8), E € Gj, x; +h) or in the 
alternative form R(x; h, k) = JZU- DIG +h - 
s)*n® (s) ds. 


2 TWO-POINT BOUNDARY VALUE 
PROBLEMS 


The most common among problems of applied mathemat- 
ics type that appear in physics, engineering, and so on are 
boundary value problems for partial differential equations. 
As an introduction to difference methods for such problems, 
we consider here the corresponding problem in one dimen- 
sion, the two-point linear differential equation problem: 


Find u € C?[a, b] such that 
Lu = —(k(x)u'Y + pu = fi), a<x<b (11) 


with boundary conditions 


rotu) = you (a) — Sok(a)u' (a) = a a2) 
ru) = yu (b) + 8,k(b)u' (b) = B 


Here, u’ = du/dx, k(x) > kọ >0, a <x <b and ke 
C}(a,b), and p, f € C(a, b) are given real-valued func- 
tions and Yo, 89. Yi» 51, & B are given real numbers, The 
operator £ is self-adjoint, that is, ibs Luvdx = fy Luudx 
The solution u will then be a twice continuously differen- 
tiable function. 

Such problems arise, for instance, if we let u be the 
displacement of a (thin) elastic string subjected to forces 
with distribution defined by f. In the simplest model, k 
is constant, p(x) =0, and the string is fixed at (a, a), 
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Figure 1. An elastic string subjected to forces. 


(b, B), see Figure 1. This equation also arises if we let u be 
the temperature in a heat-conducting rod that is perfectly 
insulated except at the endpoints, where the temperatures 
a, B are given. The material coefficients k and p are often 
assumed to be constants. 


Existence and uniqueness 

For boundary value problems, there are intrinsic questions 
regarding existence and uniqueness of solutions. There 
holds the following theorem. 


Theorem 1 (Fredholm’s alternative) Equation (11) has 
exactly one solution if and only if the homogeneous problem 
(with f =0 and homogeneous boundary conditions a= 
8 = 0) only has the trivial solution. 


We now give conditions under which the boundary value 
problem (11) has exactly one solution. 


Theorem 2. If k(x) > 0, p(x) =0, y? +87 > 0, 8, = 0, 
Y; 2 0, i = 0, 1 and if any of the following conditions holds, 
G) Yo #0, Gi) yı #0, Gi) p(x) > 0 for some x € (a, b), 
then the boundary value problem (11) has exactly one 
solution. 


That the homogeneous problem —(ku')! + p(x)u = 0, 
a<x <b, rolu) =r (u) = 0, where the boundary oper- 
ators ro, r; are defined in (12), has only the trivial 
solution, can be shown by partial integration of 
Lu =0. 


2.1 Difference approximations 


The boundary value problem (11) cannot, in general, 
be solved analytically. We shall now present a sim- 
ple but efficient numerical method. In order to simplify 
the presentation, we shall mostly consider the following 


problem: 


Find u € C?[a, b] such that 
Lu = —u" + p(x)u = f(x), a<x<b (13) 
u(a) =a, u(b)=ß 


where p(x) > 0, a < x < b, and max(a, B) > 0. 

Let xg = 4, xX; =x; +h, i=1,2,..., N, Xy41 =b, 
where h = (b ~ a)/(N + 1) be a uniform partitioning n = 
T, Of the interval [a, b]. In order to find an approximate 
solution at the points x,, we approximate u” by finite 
differences 


~u” (x) = hlu) + 2u@,) uig) 


Letting u, be the resulting approximation of u at x;, 
i=1,2,..., N, we get 


Lpp) = h Ly i1) + up) — Up Ciga )] 
+ paua) = f E), 
i=1,2,..., N, u,(@) =a, u,(b) =$ (14) 


This is a linear system of N equations in the N unknowns 
u,(x;), i =1,2,..., N. That the corresponding matrix is 
nonsingular follows from a discrete maximum principle. 


Lemma 1 (Discrete maximum principle) Let L, be 
defined by (14) and v, be any function with max{v, (xo), 
V, (Xy41)} Z 0 defined on x; for which L,u,(x;) <0, i = 
1,2,..., N. Then, 


max v(x) = max vu, (x; 
o<isN+1 nC) i20, N41 nO) 


that is, the largest value of v, is taken at one of the 
endpoints. 


Next, we show the corresponding error estimates for T} 
and ep. We now define the truncation error of the difference 
approximation as the error (or defect) we get when we 
substitute the difference approximation u, in (14) for the 
exact solution u of (13). 


Definition 2. The function tp; = (Lp); ~ fu i=l... 
N is referred to as the truncation error %,,; of the difference 
approximation (14) at x;. 


Definition 3. The function e,(x;) = u(x;) — up (x), i = 
0, 1,..., N + 1 is referred to as the discretization error of 
the difference approximation (14) at point x;. 
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Note that the truncation error and the discretization error 
are related by 


Lien = Th (15) 


It turns out that t, is easily estimated via Taylor expansions. 
We see from (15) that in order to estimate ep, we need a 
bound of the inverse of the discrete operator L, (if it exists). 
Such a bound is provided by the next barrier function 
lemma, which can be proven by the discrete maximum 
principle. 

In order to estimate e,, it is convenient to use the 
following notations: 


lalar = 


max |v (x; v = 4 
my, DL alan, pax tale 


(16) 


Lemma 2 (Barrier function lemma) Assume that there 
exists a function w, (called barrier function), defined on 
Xp i=0,1L...,N+1, which satisfies L,w, > 0, w, > 0 
and w, (Xo) = 0. Then, any function v, defined on x, i= 
0,1,..., N + 1, for which v, (xo) = 0 satisfies 


lv | < aX, Wh 
hln uana, = Ivalon, F LAAR (17) 
h 


miny, L,w 


Lemma 3. (a) tp; = 4-?[—u(x;_,) + 2u(x)) — ula) 
+u” (x;). 
(b) fu e C“ (a, b), then 


1 
Thi = ~ ue), Xj. < Š; < Xizi 


Proof. Since f; = f (x;) = p(x)u(x;)) — u”(x;) and (Lyu); 
= h-[—u(x;_,) + 2u(x;) — uy.) + poula), (a) fol- 
lows. (b) follows now by Taylor expansion around X;. 
Hence, if u e C® (a,b), the truncation error is O(h2), 
h— 0. Oo 


In order to prove that the discretization error is also 
O(h*), we use Lemma 2. As a barrier function, we use 


(18) 


Note that w,(a) = w, (b) = 0. 


Theorem 3. The discretization error of the difference 
approximation (14) satisfies 


1 
lu = Uplayuom £ ggl- DAP max WOOL 9 


Proof. Note that w,, defined by (18), satisfies 1 > Wy 
2 0 on x, and, as is easily seen, L,w, = 8/(b — a)? + 
P;,w,(x;) = 8/@ — a)? > 0. By Lemmas 2 and 3(b), we 
now get 


(b-ay 
lu — 44 lauana S|u- Unlam, Gy ~g lilu x Un, 
(b — a)? (b — a} 
SEa [Liu — fin, = 3 : Tala, 
(b ~ a)? h? 
s SSS max uD) 


2.2 Richardson extrapolation 


For a uniform mesh and sufficiently smooth solutions, 
the order of accuracy of the approximate solution can be 
easily improved using the classical trick of Richardson 
extrapolation and nested meshes involving approximations 
at same meshpoints for two different values of h. 


Theorem 4. Letu € C°(a,b) and let u, be the solution of 
(14). Then, 


u(x;) — uy, (x,) = h?p(x,) + OR’), 
h>0O, i=1,2,...,.N 


where @ is a function that is independent of h. 


Proof. Letting ¢ be the solntion of the auxiliary problem 


n 1 
-+ pe= -pe a<x<b, 9(a)=9(b)=0 
and assuming that ọ € C4(a,b), which holds if u € CÉ 
(a, b), it follows that L, (e — h?g); = O(h*), h —> 0, where 
e =u —u,. Applying the Barrier Lemma completes the 
proof. oO 


This argument can be repeated so that if u is smooth 
enough, one can prove an h? expansion of the error u(x;) — 
u,(x;). Hence, repeated Richardson extrapolation may also 
be applied. Such error expansions do not, however, always 
exist. For instance, if one of the boundary points is not a 
meshpoint in the uniform mesh, the distance 8h,0 <8 < 1 
of this boundary point to the nearest meshpoint is not a 
smooth function of h. 

The extrapolation procedure has advantages over con- 
ventional higher-order methods. Thus, the basis difference 
formula can be very simple, which makes repetition with a 
new mesh size easy and the method automatically provides 
error estimates. 
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For extensions of such asymptotic error expansions and 
Richardson extrapolation for linear finite elements; see 
Blum, Lin and Rannacher (1986). 


2.3 Computing high-order approximations of the 
derivatives of the solution 


We now show that not only the solution but also its 
derivatives can be computed to high accuracy, using the 
already computed approximate values of the solution. 


Theorem 5. Let u € C%(a,b) be the solution of —u" + 
p(x)ju = f,a<x <b, ula) =a, u(b) = p and let u, be 
the solution of the discrete problem, discretized using central 
differences on a uniform mesh. Then, 


Up ig) — Un 1) 


u'(x;) = + O(h*) 


Proof, Theorem 4 shows that u,(x;) = u(x;) — h?@(@;) + 
O(h*), where @ does not depend on h and ọ € C?(a, b). 
Hence, using Taylor series expansions, 


Uy, (Xizi) — Mav U(x;41) — U(x) 
2h ji 2h 


ie Tip PE + OH) = wC) 


h? 
+ FAGI) — hg (x;) + OÔ) 
where n) € (j_1, *j41)- o 


In a similar way, assuming a correspondingly higher 
order of regularity of the solution, even higher-order deriva- 
tives can be computed with error O(h?). This result is not 
obvious, because to compute an approximation of u’, we 
make use of approximations of u, divided by h or a higher 
power of h. That we do not loose one or more power(s) of 
h in the order of approximation is due to the existence of 
an h-expansion of the errors. 


3 FINITE DIFFERENCE METHODS 
FOR ELLIPTIC PROBLEMS 


We present in this section various difference methods for 
the numerical solution of partial differential equations of 
elliptic type. Discretization errors are derived for operators 
of positive type. The derivations are done for problems 
in two space dimensions but most results can be readily 
extended to problems in three space dimensions, Further 


results on difference methods for elliptic problems can be 
found in Section 6. : 


3.1 Difference approximations 


Consider first the Poisson problem Au = f with given 
Dirichlet boundary conditions on a unit square domain, 
aligned with the coordinate system, and a uniform rectan- 
gular mesh &, with mesh sizes h, = ,h and hy = p3h 
in the x- and y-direction respectively. Here, p, and pz 
are given positive numbers, chosen such that 1/p,h is an 
integer, and h is a small positive parameter, which we 
intend to make arbitrarily small to get a sufficiently accurate 
numerical solution. In each interior meshpoint in Q, (see 
Figure 2a), we use the five-point difference approximation, 


xX, y+ ho 


x-ħ,y 


X y- h 
{a) 


(b) 


Figure 2. Local difference meshes (a) rectangular domain, (b) 
curved boundary. 


$ 
i 
3 
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A®u := (Df Dy, + D} D;,)u(x, y) and on the boundary 
points, we use the given boundary values. 

Let h; =1/(N; + 1),i = 1,2 and, hence, let p,/p, = 
(N, + 1)/(N2 + 1), where N; + 1,i = 1,2 are the number 
of meshpoints in the two coordinate directions. Let u, 
denote the corresponding approximate solution. We then get 
a system of N,N, linear algebraic equations of the form 


Lplp = hy [u(x — hy, Y) — 2up (X, Y) + Up + hy, y)] 
+ h3 lur, y — h) — 2u,(x, Y) + Up, y + ha) 
=f@,y), x,y ER (20) 


If we order the meshpoints in lexicographic (horizontal) 
order, the corresponding matrix takes the form 


A, = block tridiag (I, A® , I;) 


where A“ = hy? tridiag (1,—2,1)-2hz7I and I; = 
hy7I, i = 1,2,..., Nz. Here, A® and the identity matrix 7 
have order N, x N, and there are N, block-rows in A}. 
Systems with matrix A, can be solved efficiently using var- 
ious methods such as fast direct methods or multigrid and 
algebraic multilevel iteration methods; see, for example, 
Axelsson and Barker (1984) and Hackbusch (1985). 

Assuming that u € C*(Q), the local truncation error 
L,u — f of the difference approximation is readily found 
to be 


1 1 
Lyu- f= pre +5 y) + gy y +52) 
=0(k), —h; <% <h, i=1,2 


Curved boundaries 

For a more general domain &, for instance with a curved 
boundary such as illustrated in Figure 2b, we must modify 
the approximation scheme at interior points in Q, next to 
the boundary. There are two such efficient schemes. The 
first uses a generalized five-point difference approximation 
with, in general, noncquidistant meshlengths (see Shortley 
and Weller, 1938): 


1 1 
Liu, = [Eme —u,(P)) — ita?) = ww} 


2 


1 
x ee + {zn —u,(P)] 


1 2 
iz [u, (P) uoz TIA f(P), PEQ, 
(21) 


Surrounding points in the East, West, North, and South 


where hg, hy, hy, hg denote the distances from P to the 


directions, see Figure 2. Unless hg = hy and hy = hg, the 
local truncation error is O (h) here. The coefficient matrix is 
in general not symmetric. The second method uses a linear 
combination of weighted linear interpolations in the x- and 
y-directions, 


Lpp = {hpu (W) + hwu, (EZ) — (hg + hy)u,(P)} 


2 
hg 


} = {hyu (S) + hg, (N) — (hy + hs)u,(P)} 
N 


1 hg+h hg +h 
=; [iw zt w +hsSt"s] f(P), PE, 

(22) 
Here, the coefficient matrix is symmetric. 


Remark 1. To provide an alternative to the Short- 
ley—Weller approximation for treating curved boundaries, 
much work has been devoted to numerical grid generation 
during the past 20 to 30 years. For instance, curvilinear 
boundary-fitted finite difference methods became popu- 
lar and applied extensively in numerical fluid dynamics 
problems (see Thompson, Warsi and Mastin, 1985). More 
recently, much effort has been devoted to variational grid 
generation methods, which can provide more robust meth- 
ods applicable also for very complicated geometries; see, 
for example, Garanzha (2004) and the references there in. 


3.2 Higher-order schemes 


3.2.1 The nine-point difference scheme 


Let Au = f, and consider first the cross-directed five-point 
difference scheme on a local equidistant square submesh, 


ASO = Lh tu, — hy y- h) + uy +h, yh) 
+u,(x—hy +h) +u,@ +h, y +h) 
—4u,(x,y)], x,y E Qp (23) 


It is readily seen that, for a sufficiently smooth function u, 


2 2 
5 4) 4(,,6 6 
Ae = Au + gre? + uM) + a’ (u® +uy ») 
+ O(n) 
2 2 
Gx) Ufy h (4) cadet) 
Ape = Auth (ul + Gurry + Hy Vrat 
x (u® + 15utyzexyy + 15exyyypy FHM) + ORS) 
Let AD be the nine-point difference scheme defined by 


2 1 
AP = ZAP HAr” 
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The coefficients in this stencil equal 1/6 for the corner 
vertex points in the square with edges 2h, equal 2/3 for the 
midedge points, and equal —10/3 for the center point. 

A computation shows that for a uniform rectangular 
mesh, 


1 1 
Ore a 142 E 6 
AD uy = f + Gy AS + sep hh (AT + faxy) + OAS) 
where A? = A(Af). Using a modified right-hand side 
in the difference formula, it follows that the difference 
approximation 


h2 
apu =r Fae] EDER 24) 


has truncation error O (h4). 

Further, it follows from (24) that for a sufficiently smooth 
function f, Af =AMF—(1/12)rA2f +04. A 
computation shows that h?f,,,, = AACS f — A f] + 
O(h*) and, therefore, the nine-point stencil with the next 
modified right-hand side 


1 1 
Au, = f + qh an f oS sagt An At 


+ Tait fin (x,y) E Q, 

has a truncation error O(h). z 

The implementation of this scheme is simplified if f 
is given analytically so that Af and so on can be com- 
puted explicitly. If f = 0, then AO üy = 0 has an order 
of approximation O(h), Hence, this scheme provides very 
accurate approximation, for instance, for far-field equations, 
where frequently Au = 0. 

The above is an example of a compact difference scheme 
(for further references on such schemes, see Houstis and 
Rice, 1982). 


3.2.2 Difference methods for anisotropic problems 
and problems with a mixed derivative 


Consider first the anisotropic differential equation au,, + 
buy, = f(x, y), œŒ, y) € Q, where u = g(x,y), (x,y) € 
89 and f and g are given, sufficiently smooth functions. 
Leta > 0 and b > 0. 

Here, the nine-point difference approximation has a sten- 
cil, as shown in (25) with c = 0. If we modify the right- 
hand side to be f + 1/12h? Af, it can be seen that in this 
case the local truncation error becomes O (h4). 

Consider next the differential equation with a mixed 
derivative 


au,, + 2cu,, + buy = f(y). Gye 


with given boundary conditions. We assume that a > 0, 
b > 0, and c? < ab, which are the conditions for ellipticity 
of the operator. For the mixed derivative, we use the central 
difference approximation u,, % D?D3u, that is, 
1 
Uyy © galee —h,y ~h) —u,(x +h, y— h) 
— u, —h, y +h) tu +h, y +h 


Combined with the nine-point difference stencil, it becomes 


b 
eee Gey ee 
ah? 5Sa~b  —l0(a+b) 5a—b (25) 
a+b a+b ‘ 
2 +3c 5b —a 5 -3c 


3.2.3 Difference schemes for other regular 
tessellations 


Finite differences can be extended to nonrectangular 
meshes. 

For a regular (isosceles) triangular mesh, one can form 
the obvious seven-point difference stencil. For a hexagonal 
(‘honeycomb’) mesh, one finds a four-point stencil 


The symmetrically located nodepoints in the seven-point 
scheme allows one to readily approximate second-order 
cross derivatives. Similarly, in 3D problems, a cubocta- 
hedral stencil involves 13 nodepoints. If a Cartesian grid 
is used, approximations of the second-order cross deriva- 
tives require at least nine points in 2D and 19 points in 
3D, that is, two and six more than for the hexagonal and 
cuboctahedral stencils. 

The biharmonic operator A?u= f, (x,y) €Q with 
boundary conditions such as u = g(x, y), ðu/ðn = q(x, y) 
on 8Q give rise to a 12-point stencil for a regular equidistant 
mesh, 


1 
X <8 9 
nv“4li1 -8 20 -8 1 (26) 
Je ae 
1 
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which has truncation error O(h2). The biharmonic problem 
can, however, more efficiently be solved as a coupled 
problem 


E- Au =0 
A5 = f 


using a variational formulation hereof, see, for example, 
Axelsson and Gustafsson (1979) and Ciarlet and Raviart 
(1974). 


3.3 Approximating boundary conditions 


So far, we have only considered Dirichlet type bound- 
ary conditions u = g on 8Q. For a Neumann (ðu/ðn := 
Vu -n = g) or the more general Robin (3u/3n + ou = g) 
(cf. Gustafson and Abe, 1995) type boundary conditions, 
one must use special approximation methods. The regular 
difference mesh is then extended around the boundary line. 
If the normal to the boundary goes through meshpoints, 
we can use a central difference approximation for 3u/ðn, 
using an interior meshpoint and its symmetrically reflected 
point in the extended domain (see Figure 3(a)). In other 
cases, we can still use central difference approximations 
for du/dn (at U, R in Figure 3(b)) but we must interpo- 
late the function value in the symmetrically located points 
in the interior. This can be done using linear interpolation 
from the surrounding points (P, N, NW, W) in Figure 3(b) 
to find the value at T, or using biquadratic interpolation, 
involving some additional points. The local truncation error 
becomes O(h) or O(h?), respectively. It can be seen that 
one can always get a positive-type scheme in the first case, 
but not in the second case. 

For discretization errors for problems with curved bound- 
aries, see Section 6. 


3.4 Monotonicity and discretization error 
estimates 


Monotone operators provide a basis for pointwise bounds 
of the solution and the discretization errors corresponding 
to various difference approximations. 

The general form of a linear difference operator L, 
depending on some discretization parameter h is 


Lu, (P;) = Ya; (Pu, (P) = F(R) LaetL; 2, ogi 
j=l 


Here, the function F includes the given source function and 
the given boundary data. 

The operator is said to be monotone if L,v = 0 implies 
v>0, where v is a function defined on Q,. Note that 


ya 


(b) 
Figure 3. Neumann boundary conditions. 


a monotone operator is nonsingular because if L, is 
monotone and L,v < 0, then L,(—v) > 0, so —v > 0, that 
is, v < 0. Hence, if L,v =0, then both v > 0 and v < 0 
hold, so v= 0. 

Jt is also readily seen that a nonsingular matrix A is 
monotone if and only if A`! > 0, where the inequality 
holds componentwise. Further, for any monotone operator 
A, there exists a positive vector v > 0 such that Av > 0. 
Take, for example, v = Ale, where e = [1,1,..., 1]". If 
some component v; = 0, it would follow that all entries of 
the ith row of A~} were zero, which is clearly not possible. 

Consider now a monotone difference operator L, and 
a normalized function w with max; w(x;) = 1, called a 
barrier function, such that L,,w(x) > a for all x € Q, and 
for some positive constant a. Then, 


Wir" leo < 


RI= 
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where Lillo = SuPyzo IZ4VIloo/HVIlo and livio = 
max |v(x;)|, x; € Q, is the supremum norm. 

As is readily seen, this yields a discretization error 
estimate, 


1 
lp lleo = llu — Ualloo < glen ~ frlleo 


where u is the exact solution to a given differential 
equation Lu = f, L, is a discrete approximation to £, 
u, is the solution to the corresponding discrete equation 
Ly, = fy, and Liu — fp = Lu — Lu, is the truncation 
error. 

The barrier function thus leads to an easily computable 
discretization error estimate if the discrete operator is mono- 
tone. In addition, as pointed out by Collatz (1986), the 
monotonicity property is of great practical value because 
it can readily be used to get lower and upper bounds for 
the solution. 

Monotone operators arise naturally for positive-type dif- 
ference schemes. The difference operator L, is of positive 
type if the coefficients satisfy the sign pattern a;; > 0 and 
a, <0, j Fi. 

If, in addition, a; > Soy lal, i= 1,2,...,n, fA 
with strong inequality for at least one index i, then the 
operator is monotone. This is equivalent with that the matrix 
A= la, FF j= is an M-matrix. 

Stieltjes proved that a positive definite operator of posi- 
tive type is monotone. 

However, even if the operator is not of positive type, it 
might nevertheless be monotone. For instance, a familiar 
result states that if A = M — N is a weak regular splitting 
(i.e. M is monotone and M~!N > 0), then A is monotone 
if and only if the splitting is convergent, namely, there 
holds p(M~!N) <1 with p being the spectral radius of 
MN, 

Hence, monotonicity of a given matrix A, respectively, 
a linear discrete operator L,, can be proven by construct- 
ing convergent weak regular splittings A = M — N. As is 
shown below, this result can be extended to matrix splittings 
of a more general form. 


3.4.1 Bounding inverses of monotone matrices 


To bound the supremum norm of the inverse of a nonsingu- 
lar monotone matrix A, we consider the following Barrier 
Lemma both in its general and sharp forms. 


Lemma 4 (The Barrier Lemma) Let A be a monotone 
matrix of order n and let v be a normalized vector, ||v||,. = 


l, such that min,(Av); > a for some positive scalar a. 
Then, 


(a) WA < 1a; 


(b) HA! = 1/max{min,(Av);, v€ V4}, where V, = 
{ve R, Ivl = 1 Av > 0} 
(c) IATH = HXlloo where x is the solution of Ax = e. 


Proof. For a proof, see Axelsson and Kolotilina (1990). 
jen) 


For later use, note that for a strictly diagonally dominant 
matrix A = a], it holds the inequality 


1 


aalan- Zia 


iA 


Wan < 


We now extend the barrier lemma to the case where the 
positive vector v satisfies the weaker condition Av > 0. 
This result can be particularly useful if Dirichlet boundary 
conditions hold at some part of the domain Q. 


Lemma 5. Let A be monotone of the form A= 
Ay —Ap 
Ay 22 
are nonnegative and Ay) has no zero rows. Further, let 
v= [v; vy]? be a positive vector such that |\|v||,, = 1 and 
Al[v,; v2} = [p:q]", p > 0, q > 0. Then there holds 


f lAl Azul 
ani 1a eee. | tg ei 
Allo S ( + eats, | ( 5 aey) 


1 ~~ 
x max ~ — (27, 
min(A,,V;); min(q + Ay Ay P); j 


ı where A,, is monotone, Ay, and Ap, 


A proof can be found in Axelsson and Kolotilina (1990). 


3.4.2 Proving matrix monotonicity 


Here, we summarize some classical results on weak regular 
splittings, convergent splittings, and Schur complements of 
matrices partitioned into a two-by-two block form, which 
can be used to ascertain that a given matrix is monotone. 
Let M, N € R”™”; then, A = M — N is called a weak 
regular splitting if M is monotone and MIN is 
nonnegative. The splitting is convergent if p(M~!N) < 1. 


Theorem 6. A weak regular splitting PA=M—N, 
where P is nonsingular and nonnegative, is convergent if 
and only if A is monotone. 


Proof. See Axelsson and Kolotilina (1990). is 


In practical applications, it can be more convenient 
to use, instead of Theorem 6, the following sufficient 
conditions. 


Theorem 7. Let PAQ = M — N be a weak regular split- 
ting with P and Q nonsingular and nonnegative. Then, A is 
monotone if there exists a positive vector v such that either 
M—'PAQv > 0 or v'M~'PAQ > 0. 


Proof. Since by assumption M — N is a weak regular 
splitting, it follows by Theorem 6 that PAQ is nonsingular 
and monotone if p(M~!N) <1. But M~!PAQv = (I — 


~ M-!N)v ot, since M'N is nonsingular, 0 < M—1Nv= 


Q -M7!PAQ)v <v if M7'PAQv>0. Hence, with 
D = diag (v,,U2,...,¥,), that is, De =v, 0 < DM 
NDe < e or ||D7!MT?ND]] < 1, so p(M7!N) < 1. 

In a similar way, if v'M-!PAQ > 0, it follows that 
|DpM-'ND™ ll; < 1, where ||- if; is the 2,-norm. o 


Remark 2. Theorem 7 can be particularly useful when 
we have scaled A with diagonal matrices P and Q. 


From Theorem 7, one can deduce the following important 
monotonicity comparison condition. 


Corollary 1. Let B) < A < B, where B, and B, are 
monotone matrices. Then, A is monotone and By i <A 
By}. 


Proof. See Axelsson and Kolotilina (1990). o 


Theorem 8. Let A=[A,;] be an mxm block matrix 
satisfying the following properties 


(i) A; are nonsingular and A; > 0,i=1,2,....m. 
(ii) For i+ j, there exist matrices Pi, 0 such that 
Ay $ — Py Ay 


(iii) There exists a positive vector v such that either the 
block components (Afv); are nonzero and nonnega- 
tive fori =1,2,...,mor Au > 0, where t; = Aj’ Vp 
eri bY ee T 


Then A is monotone. 


Proof. Let D = blockdiag(Ay),.--,Amm)- By 0), Aj" 
> 0 and by (ii), A; Az! < 0, i Æ j, implying that AD < 
I. Now Theorem 7 with P = M =1, Q = D~! can be 
applied to prove monotonicity of A if v'AD7! > 0 or 
ADtv > 0, which, however, holds by (iii). (Note that we 
have assumed strict inequality, Aj,’ > 0.) o 


The following theorem shows that monotonicity of two- 
by-two block matrices holds if and only if its Schur com- 
plement is monotone. 


Theorem 9. Let A be a two-by-two block matrix 


Ay A 
Ae | Au ] 
! An An 
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where A and Ag, are square submatrices and Aj, is 
nonsingular. 


(a) If Aj, is monotone and Aj} Ay and An, A7; are non- 
negative, then A is monotone if S = Ay) ~ Ay, AÑ] 
Aj, is monotone. 

(b) Conversely, assume that A is monotone. Then S is 
monotone. 


Proof. To prove the existence of the inverse in part (a), 
use the block matrix factorization of A, which shows that 
A is invertible if and only if A,, and S are invertible. 

The monotonicity statements in both parts follow from 
the explicit form of the inverse of A, 


gta AY + 4p AnS AnA An 4ns 
K SAAT st 


QO 


Example 1. Consider the mixed derivative elliptic 
problem 


dit, ~ 2CUzy — buy = f in Q = [0,17 (28) 


u = 0 on 3N, with variable coefficients a(x, y), b(x, y) > 
0, and c(x, y) > 0. After elimination of boundary condi- 
tions, the standard nine-point second-order accurate finite 
difference approximation of this problem on a uniform 
mesh, yields the block tridiagonal n? x n? matrix 


1 rey Ce op EE, 
A= qalok tridiag [( 7 bi, +), 


T(—a,, 2a, + b;), —a,), r( = z, “he -£)] 


where T(a;,b;,c;) stands for a tridiagonal matrix with 
diagonal coefficients b; and off-diagonal a; and c;. Let 


B= -ablock sridiag [z(e -b;, +), 
T (a, 2(a; + b;), —4;), (4. -b,, 0)| 


Clearly, A < B and by Theorem 7 with P = Q = I, M = 
B, monotonicity of A follows if the inequality B~' Ae > 0 
and monotonicity of B hold. 

The diagonal blocks of B are clearly monotone, and since 
the block components (Be),,i =1,2,...,m are nonzero 
and nonnegative, applying Corollary 1 to BT, we conclude 
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that B is monotone if 


abiga dibi 
b »o S b 
aii + bpi ai- + b; 


By a symmetry argument, B is also monotone if 


ibi 4,410; 
os +o 
ai + bi iyi + bigi 


To prove the monotonicity of A, it remains to be shown that 
B71 Ae > 0. Clearly, (Ae); is nonzero and nonnegative for 
i=1,2,...,, and since the diagonal blocks of B~! are 
positive and B~! > 0, the required inequality follows. 

Note that in the constant coefficient case, the conditions 
above take the form 


ab 
a+b 


es (29) 
which is stronger than the ellipticity condition c < (,/ab), 
Note also that in the matrix in (25), where a nine-point 
stencil has been used for the approximation of au,, + bu Sa 
the difference stencil is of positive type only if |c{ < 
(a+b)/6. 


4 FINITE DIFFERENCE METHODS FOR 
PARABOLIC PROBLEMS 


In this section, we discuss the numerical solution of 
parabolic problems of the form 


a = Lut fC, i), KEQ,t>0 30) 


with initial condition u(x, 0) = u(x) and given boundary 
conditions on 0, valid for all t > 0. Here, £ is an elliptic 
operator and f is a given, sufficiently smooth function. 
The boundary conditions can be of general, Robin type, 
(u/an) + ofu — g(x, t)] = 0, where o > 0, g is given and 
n is the outer unit vector normal to 38. Here, o = co 
corresponds to Dirichlet boundary conditions. Frequently, 
in applications we have £ = A, the Laplace operator. As 
is well known, this equation is a model for the temperature 
distribution in the body Q, for instance. The equation is 
called the heat conduction or diffusion equation. 

Stability and uniqueness of the solution of problem (30) 
can be shown using a maximum principle, which holds for 
nonpositive f, or decay of energy for the homogeneous 
problem. Such and other properties of the solution can 
be important for the evaluation of the numerical solution 
methods. 


The equation can be solved by a semidiscretization 
method, such as the method of lines as it has also been 
called. In such a method, one usually begins with discretiza- 
tion of the space derivatives in the equation, which leaves us 
with a system of ordinary differential equations in variable 
t, that is, an initial value problem. The system is stiff, and 
to enable choosing the time-steps solely based on approx- 
imation properties one uses stable implicit methods. In 
particular, a simple method called the 6-method can be used. 

Alternatively, we may begin with discretization in time 
using, for instance, the @-method. This results in an ellip- 
tic boundary valuc problem to be solved at each time- 
step, which can be done using the methods presented in 
Section 3. 

Usually, the order in which we perform the discretiza- 
tions, first in space and then in time, or vice versa, is 
irrelevant in the respect that the same algebraic equations, 
and hence the same numerical solution, result at each time- 
step. However, the analysis of the methods may differ. Also, 
if we intend to use a variable (adaptive) mesh in space, it is 
more natural to begin with the discretization in time. At var- 
ious time-steps, we can then use different approximations 
in space. 


4.1 Properties of the solution 


4.1.1 A maximum principle 


For ease of exposition, we describe the maximum principle 
for an equation of the form (30) in one space variable. On 
the other hand, we allow for a domain whose boundary 
may depend on time. (Such problems arise in so-called 
free boundary value problems, where the boundary between 
two matters, such as ice and water, may vary with time. 
Frequently, the temperature of ice is assumed to be constant 
and it remains to compute the temperature distribution in 
the water. Such a problem also arises in connection with 
permafrost.) 

Hence, let the domain D be defined by the indicated parts 
of the boundary lines 


Lo = {(x, 0) [ $10) < x < $,(0)} 
L, ={@%,T) AT) sx <4,(7)}, T>0 


and the curves 


K =(@,@,0 [0<t<T} 
K, = {(@,(0),2) |O<t <T} 


where ,(t) < b(t) are continuous functions. Let D= 
Lo U K, U K, and I =f, UL, (see Figure 4). 
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Figure 4. Domain of definition for a parabolic problem. 
Theorem 10. [fu is the solution of 


ðu u z 
ea tl@n, @NeD G1) 


where f <0, and if u is sufficiently smooth in D, then 
MAXy yep U(x, t) < MAX, yer, UC, t), that is, u takes its 
maximum value on Ty. 


Proof. By contradiction. oO 


The same maximum principle can be proven for prob- 
lem (30). From this maximum principle, uniqueness and 
stability of solutions of (30) follow. 


Theorem 11. Let D=Q~x [0,7], t>0. 


(a) Problem (30) has at most one solution in D, which 
takes prescribed boundary and initial conditions on To 

(b) Ifthe boundary and initial conditions are perturbed by 
an amount of at most £, then the solution is perturbed 
by at most this amount. 

(c) Iff in (30) is perturbed by a nonpositive function (but 
the boundary and initial conditions are unchanged), 
the solution u is also perturbed by a nonpositive func- 
tion. 


As an application of the maximum principle, we consider 
the derivation of two-sided bounds of the solution. 


Corollary 2. Let Ku = u'(t) + Lu = f(t), t > 0 and let 
4 and u be two sufficiently smooth functions that satisfy 
Kus f <Kūin D, u <T onDy Thenu <u <i. 


4.1.2 Exponential decay of energy 


Consider a heat equation with a reaction tenn 


u, =u, tan, O<x<l, t>0 (32) 


where u(0, t) = u(1, t) = 0 and u(x, 0) = ug(x) is a given 
sufficiently smooth function. We assume that a is a 
constant, satisfying a < K — c, where K = Cistel), 
c is positive constant and ||u||? = f u? dx. Letting E(t) = 
(1/2) fy ur(.,t)dx (the square L, norm, a measure 
of energy) and using (32), we find that fe u,u dx = 
-f uz dx +a fy u?dx, that is, E'@) = |lull2(@ — K) < 
~cllul|? = —2cE (t), or 


E(t) <e"E(0), t>0 


with E(0) = (1/2) i ud dx. Hence, E(t) > 0 exponen- 
tially, as t > oo. 

The constant K can take arbitrary large values, For 
example, for u(x, t) = sin krx g(t) and g #0, K = (kx), 
and here k can be arbitrary large. By the classical Sobolev 
inequality, there holds that K > x2. Hence, if a < 0, then 
we can take any c < K, or c = n? and E(t) < e~?* *E(0), 
t>0. 

A similar result holds also for more general parabolic 
problems u, + Lu = au, where the operator £ is coercive, 
for example, fọ Cuudx > p fou? dr for some positive 
constant p. 


4.1.3 Exponential decay of the solution 


The solution of a parabolic problem depends on initial data 
on the whole space © at all times. However, an exponential 
decay holds not only for the energy but also for the solution, 
away from the support of the initial function, assuming it 
has a bounded support. This observation can be based on 
the explicit form of the solution of the pure initial value 
problem (referred to as the Cauchy problem), 


uU, =u, in -co<x<00, t>0 
1 o ay} 
ue t)= =f Eoy 63 
=% 


where u(x, 0) = up(x), Uy = 0 outside a bounded domain 
Q. From (33), it readily follows that 


x? 
lu(x,H|= 0 (ime) as |x| > œ 


which shows a rapid decay away from the region of 
compact support of the initial function. Formula (33) also 
shows that the solution u(x, t) is an infinitely differentiable 
function of x and t for any positive ¢ and that a similar 
decay holds for its derivatives. 


Remark 3. Since, by a partitioning of unity, any initial 
data can be partitioned in packets of initial data with com- 
pact support, it can hence be efficient for linear problems 
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to compute the solution for each such data packet, even 
in parallel, at least for the smallest values of ż. Since the 
domain of influence widens with O(./t), as t increases, we 
must, however, add an increasing number of contributions 
to the solution from several such subdomains to get the 
solution at each point in the solution domain. 


4.2 Finite difference schemes: the method of 
lines 


For the numerical solution of parabolic problems, one 
commonly uses a semidiscretization method. 

Consider first a discretization of space derivatives in (30), 
for instance by use of central difference approximations. 
Then, for each nodepoint in the mesh 2, C Q, including 
the boundary mesh points, where the solution u is not 
prescribed, we get an approximation U(t), which is a vector 
function of t, and whose ith component approximates u at 
x; € Q, at time t. The vector function U satisfies the system 
of ordinary differential equations we get when the operator 
£ in (30) is replaced by a matrix A,, corresponding to a 
difference approximation of A. Hence, 


du(t) 
dt 


= A, UG) +b), t>0 (34) 


where b(t) contains the values of f(x, £) at x = x; and any 
nonhomogeneous boundary condition. For t = 0, we have 
that the ith component of U(0) satisfies U;(0) = uo(x;) at 
the meshpoints x, of Q. 

In general, for elliptic operators, A, has a complete 
eigenvector space with eigenvalues with negative real parts 
(see Section 3), and (34) is stable if b is bounded. This 
method has been called method of lines because we approxi- 
mate u along each line perpendicular to Q in the time-space 
domain and beginning in x;. 


4.2.1 Stability of the method of lines 


We comment first on the stability of linear systems of 
ordinary differential equations, 


d 
= =Au+b(). 1>0,u0)=u 65) 
where A is an x n matrix. Its solution is 
u(t) = exp(tA)up +f exp[(t — s)A]b(s)ds, t>0 
0 


The analysis of stability of such systems can be based on 
the Jordan canonical form of A; see, for example, Iserles 
(1996). Without going into full details, we consider only 


the case where the homogeneous system is asymptotically 
stable, that is, 


l expt A)| a 0 


Let a(A) = max; Reh; (A) be. the so-called spectral abs- 
cissa of A. Analysis shows that the system is asymp- 
totically stable for any perturbation of initial data if and 
only if a(A) < 0. Similarly, for the solution of (35), there 
holds |u(t)| — 0 if a(A) < 0 and [b(t)| —> O and |u(r)| is 
bounded if a(A) < 0 and |b(¢)| is bounded. 

Similar to the energy estimates, an alternative analysis 
can be based on the spectral abscissa B(A) of the symmetric 
part of A, that is, B(A) = max; d,[(1/2)(A + 47]. We 
assume that B(A) < 0. 

Then, let E(t) = (1/2)||u(t)||. It follows from (35) that 


E'(t) = (Au, u) + (r), u) = $[(A + ATu, uJ + bC), u) 
where (u, v) = u'v. Hence, 
E'@) < BADE) + 3 IB(A IE + BCA IO |? 
7 ; 
E'() < $B(A)E@) + [BCA IDO)? 
that is’ 
EW < 2°" EO) 


1 
4 Í IBA e2 p(s)? ds 


and, since B(A) < 0, if ||b(¢)|] —> 0, t > oo then exponen- 
tial decay of energy follows. 
The following relation between a(A) and B(A) holds. 


Lemma 6 (Dahiquist (1959), Lozinskij (1958)) —B(—A) 
< a(A) < BCA). 


Proof. If Ax= x, where Re(h) = a(A) and |x =1, 
then x* A* = ix", and hence (1/2)x*(A*-+ A)x=(1/2)(A+ 
A) =a(A). But, B(A) = maxy,y.1(1/2)x*(A* + A)x, by 
the Rayleigh quotient theory. Hence a(A) < B(A). Simi- 
larly, a(—A) < B(—A). But a(A) > —a(—A), as an ele- 
mentary consideration shows. Hence a(A) > —B(-A). O 


Unlike the scalar case, sup,..9 || exp(¢A)|} may be strictly 
greater than 1 even though ‘a(A) is negative. The next 
relation illustrates further how || exp(tA)|| may grow and 
shows exactly when sup,,. || exp(tA)|| = 1. 


Theorem 12. 


(a) eA <exp(rA)| <P, 220. 
(b) SUP;>9 || exp(tA)|| = 1 & B(A) <0. 


Proof. Let v be an arbitrary unit vector, {{v|| = 1, and let 
p(t) = v* exp(tA*) exp(tA)v. Then, 


6'(t) = (exp(tA)v)*(A* + A) exp(tA)v (36) 


that is, by Rayleigh quotient theory, $’(t) < 2B(A)d(¢). 
Hence, (t) < e® and for every t, 


ae b(t) = || exp(tA) |? < eB 
v=. 


This implies the rightmost inequality of (a). To prove the 
leftmost inequality, note that for any matrix B, ||B|| = 
max; |à; (8). 

Hence, 


Rent a(A)t 


llexp@A)]] = max je™ | = max e =e 
i i 


which proves (a). By (a), the sufficiency part of (b) is 


. already proven. The converse follows by 


sup ||exp(tA)| = 1> <1, t20 Yv eC”, |v =1 
120 


Hence, since (0) = 1, we have $/(0) < 0 Yv, ||v|| = 1 or 
by (36), v*(A* + A)v < 0 Yv, livi = 1. Hence, B(A) < 0. 
o 


Corollary 3. Jf A is a normal matrix (i.e. A*A = AA* 
that is, A is diagonalizable), then B(A) = a(A) and the 


inequalities in Theorem 12 (a) are sharp: || exp(tA)|| = 
exar, 


4.3 The 6-method 


For the numerical solution of the system of ordinary dif- 
ferential equations, arising from the method of lines, there 
exist a plethora of methods. We consider here in more detail 
only one such method, the so-called @-method. It is pre- 
sented for linear problems but it is also readily applicable 
for certain nonlinear problems. 

For (34), the 6-method takes the following form 


[I — (1 — @)kA]VG +k) = (U + Ok A)V(D) 
+ kL(1 — Obe + k) + 6b@)] (37) 


t=0,k, 2k,..., where V(O) = Up, Z is the identity oper- 
ator and V(t) is the corresponding approximation of U(s). 
Here, 6 is a method parameter. 

The @-method takes familiar forms for particular values 


of 0. For example, 6 = 1 yields the familiar Euler forward 
method 


VE +k) = VG) + RAV) + BO] 
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8 = 0 yields the backward Euler (or Laasonen) method 
Va +k) = V(t) + k[AV(t +h) +b +k) 


while § = (1/2) determines the trapezoidal (or Crank— 
Nicolson) rule 


V(t +k) = Vit)+ s{A VO + VE +o] 
+b) +b¢ +4} 


We see from (37) that the 0-method is explicit only for 9 = 
1; for all other values of 9, the method is implicit, requiring 
the solution of a linear algebraic system of equations at 
each step. The extra computational labor caused by implicit 
methods have to be accepted, however, for reasons of 
stability. 

Observe further that J — (1 — ®)KA is nonsingular if k is 
small enough or if Red; < 0 Vi and @ < 1, where 1, is an 
eigenvalue of A. 

For the stability analysis of time-stepping methods, one 
can use the Jordan canonical form of the corresponding 
system matrix. For a general linear system of first-order 
difference equations 


Va +k) = BV) + elt), t=0,k,2k,..., VO) = U 

(38) 
It shows that the homogeneous system is asymptotically 
stable, that is, || B7 || — 0,r — oo if and only if p(B) < 1, 
where p(B) = max; |);(B)|. Further, the solutions of the 
inhomogeneous system (38) are bounded if p(B) < 1 and 
le(rk)| is bounded for r — co and satisfy ||x(rk)|\| > 
0,r — œ if p(B) < 1 and [e(rk)| > 0,r > œ. 

However, even if p(B) < 1, for nonnormal (i.e. nondiag- 
onalizable) matrices, it may happen that ||B’|| takes huge 
values for finite values of r. Hence, in practice, we may 
have to require that ||B|| < 1. 

An alternative approach is to use the numerical radius 
r(B) of B, where r(B) = max{|x* Bx|; x € C”, (x, x) = 1}. 
It is seen that r(B) < ||Bl|, and it can be shown (see 
e.g. Axelsson, 1994) that ||B|| < 2r(B). It can further be 
shown that if r(B) < 1, then r(B*) < r(Byk pn) tt ot pees 
for any square matrix B. Hence, in general, the stronger 
stability condition |jB|| < 1 can be replaced by r(B) < 1. 
Unfortunately, the computation of the numerical radius is, 
in general, complicated. In the case when B is nonnegative, 
it can be shown that r(B) = p[(1/2)(B + B*)], which can 
be used to compute r(B). 


4.3.1 Preservation of stability 


We show now the conditions when the 0-method preserves 
the stability of the system (35), that is, for which it holds 
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a(A) < 0 = p(B) < 1. We then first put (37) in the form 
of (38). Thus, (38) holds with 


B = [I — (1 —0kA] U + 6kA] 
elt) = k[J — (1 — ®) kA] LC. — Obr + k) + 6b(1)] 


Let (Nj, vV) J =1,2,...,n, denote the eigensolutions of 
A with the ordering 


Re(h,) = Rem) ze = Re(n,,) 


Then, the eigensolutions of B are Mpy) J =1,2,...,n, 
where 
1+ 6kh, 
W; = LAT (39) 
J 1= (= Ok 
Theorem 13. Assume that (35) is asymptotically stable 
(Rehr, < 0) and that b(t) is bounded, all t > 0. Then 


(a) (38), with 0 < 0 < (1/2), is asymptotically stable Vk 
> 0. (Unconditional stability.) 
(b) (38), with (1/2) <8 < 1, is asymptotically stable if 


and only if 
2Re(r,) 
<-— 
(26 — DINI 
T= 1 Bis ony (Conditional stability) (40) 


Proof. An easy calculation establishes that 


ee 1420k Re(r,) + PKI, 7 
yl = Wolly = Tae Ok Rej) + 0 — OPR T 


where pw; is given by (39). Hence, 
lul? <1 > k(20— DINI <-2ReQ,;) (41) 


The assumption that (35) is asymptotically stable, implies 
that Re(h;) <0 for j=1,2,...,n. Itis easy to see that 
the inequality on the right side of (41) is satisfied for j = 
1,2, ...,n, precisely under the conditions on k presented 
above. oO 


The solution of the homogeneous system u(t) = Au(t), 
t>0,u(0) =u, where A has a complete eigenvector 
space, can be written in the form 


n 
u(t) = DOGA t>0 


j=l 


where c,,...,¢, are determined by the expansion uy = 
Dyn Cv; and {\;, v} are the eigenpairs of A. The corre- 
sponding solution of the difference equation is 


n 
VO) = iconv, t= 0,k, 2k... 


j=l 


where p; = (kd,; 0) and where we have introduced the 
function 


1+ 6p 
50) = ———., -w <B <0 
w(B; 8) 1—-a— op B 
Hence, p; is the damping factor corresponding to e! (note 
Reh, < 0), and it is important that each factor is sufficiently 
small, even for the large eigenvalues. There holds 


6 
im mio = | T= pee ad 
Bee ~00, 6=1 

Hence, |\1| is less than one only if 0 < 1/2. In particular, for 
© = 1/2, it holds that p + —1, as B > —oo. Therefore, the 
Crank—Nicolson method can have an insufficient damping, 
especially at the initial, transient, phase (small t) where all 
eigenvector components may have a Significant value. It is 
then better to choose 6 < 1/2, for instance, 9 = (1/2) — tk 
for some ¢ > 0, (The latter choice will preserve the second 
order of discretization error, as is shown below.) Otherwise, 
for a fixed value of 9 < 1 /2, the time-discretization error 
is only O (k). 


4.3.2 Discretization error estimates for the 8-method 


The discretization error estimates for the 6-method can 
be readily derived. The fully discretized scheme takes the 
form 


U —k(1 —®)L,Ju(x,t +k) =U + KOL,, u(x, t) + k(1 — 6) 
x fA, t+ k) +kOf(x, t), t=0,k,2k,... (42) 


We have f =u, — Lu and 


tk 
u(x,t +k) — u(x,t) =f u, (x, s) ds 
t 


Substituting the differential equation solution in (42), we 
then obtain that 


[I —-k(1—-9)L, lu (x, t+k) = [Z +k0L, lu (x, t)+k(1—8) 
tk 
x f(K ttk) + kOf (x, + {f u,(x, s) ds—k(1—8) 
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x L,u(x, t+k)—-kOL, u(x, oh — kK(1—6)[u, (x, t+k) 
— Lu(x, t+k)] — kolu, (x, )—Lu(x, ty] 
Letting e(x, t) = u(x, t) — v(x, t), we then find 


-k1-OL Je, t +k) = [+ KOL, Je(x, t) 
+ U(x, t; h, k) 


where t is the truncation error 


t+k 
t,t; h, k) = f u, (x, s)ds 
t 


~ [k(1 ~ 0)u, (x,t + k) + kOu, (x, t)] 
+k(1-0)(L-L,)u(x,t +k) + OL ~ L pua, t) 


Note that the truncation error consists of two parts: (1) 
the time-discretization error and (2) the space-discretization 
error. The stability analysis shows that they can be esti- 
mated independently of each other and the discretization 
error in L, norm in space becomes 


le(s t) — v, 2) If 
<|(0-4)0@)} sup luli + O@?) sup | 
t>! t>l 


+ OM )\(uP t+ wl), hko 43) 


If we choose 6 = (1/2) — tk for some positive constant E 
then the total discretization error becomes O(k*) + O(h?) 
Full details of such an analysis and applications for nontin- 
ear problems of parabolic type can be found in Axelsson 
(1984). 


Remark 4. Note that we were able to derive this esti- 
mate without the use of the monotonicity of (—A,) (cf. 
Section 3). Hence, (43) is valid even for nonmonotone 
approximations and for more general operators of second 
order. In particular, it is valid for central difference approx- 
imations L, of the convection—diffusion operators (6, 8), 
as long as h is small enough so that the real part of the 
eigenvalues of L, is negative. This is due to the property 
of the @-method for 0 < @ < (1/2) that it is stable for oper- 
ators with arbitrary spectrum in the left-half plane of the 
complex plane. Methods with such a property are called 
A-stable methods; see Dahlqnist (1963). It is known that 
multistep time-discretization methods that are A-stable can 
be at most of second-order of approximation. On the other 
hand, certain implicit Runge—Kutta methods can have arbi- 
trary high order and still be A-stable; see Axelsson (1964). 


Remark 5. A simple method to estimate the discretization 
errors numerically is provided by Richardson extrapolation. 


In applying this method to a nonstationary partial differen- 
tial equation problem, we first keep k constant and run the 
problem over a time-step with a mesh in Space with the 
following mesh sizes: say h and (1/2)h (independently of 
each other). Then, as usual, (1/3)[4u, (20%, t) — v4 (x;, 2] 
provides an estimate of the space-discretization error, pro- 
vided that this is O(h?), h + 0. 

Similarly, we can let h be fixed and run the numerical 
method with stepsize k and two steps with k/2. In the 
Same manner as above, we can then estimate the time- 
discretization error. 

If the solution is smooth, these estimates are frequently 
accurate. However, in an initial transient phase, for instance, 
when the solution is less smooth, the estimates are less 
accurate. 


4.4 Nonlinear parabolic problems 


We consider nonlinear evolution equations of parabolic 
type, that is, for which an asymptotic stability prop- 
erty holds. 

The stability and discretization error for infinite time for 
the @-method can be analyzed for nonlinear problems 


du 


at PGu)=0, t>0, uO)=meV (44) 


where V is a reflexive Banach Space and F: V —> V’, where 
V’ denotes the space dual to V. We have V > Hoy’ 
for some Hilbert space H, where <> denotes continuous 
injections. Let (-,-) be the scalar product in H and |j- | 
the associated norm. We assume that F(t) is Strongly 
monotone (or dissipative), that is, 


(FG, u) — F(t, v), u — v) > p(t)ilu — vll? (45) 


for all u, v € V,t > 0, where P = Po > 0. It follows then 


ld eis z a 
zg fl — vl) = -(FG, u) — F(t, v), u — v) 
< —polu — v|, t>0 


or 
lu) — v(ÐI < exp(—pot)|1u(O) — v(O)| 


For such nonlinear problems of parabolic type, the stability 
and discretization error for the 0-method can be analyzed 
(cf. Axelsson, 1984), Here, only a one-sided Lipschitz 
constant enters in the analysis. The accuracy of the 8- 
method, however, is limited to second order, at most. 
Here, we consider an alternative technique, which allows 
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an arbitrary high order of approximation. It is based on a 
boundary value technique. 

The method is presented in a finite dimensional space 
V =R” and on a bounded time interval. Given a system 
of differential equations 


= + F(t,u)=0, 0<t<T, u(0) = up prescribed 


we first make a transformation of the equations to a more 
suitable form. In many problems, there may exist positive 
parameters £;, 0 <£ <1, such that parts of F and of 
the corresponding parts of the Jacobian matrix dF/du 
are unbounded as O(e7'), s; > 0. We then multiply the 
corresponding equation by this parameter to get 


oS + Fu) =0, O<t<T (46) 


where £ is a diagonal matrix with entries ¢,, and now it 
is assumed that F and (3 F/ðu) are bounded with respect 
to £. 

From the monotonicity of F for t > tg > 0 follows 


(u — v), u— v) = —(F(f, u) — F(t, v), u — v) 


La 
2 dt 
< —p(t)|lu — vil? < -p6 eu- v, u-v), tet 


so 


t 
lut) — ve)? < exp ( | —2p(s) as) luto) — vl? 
< lulo) -YG o<t<T 


where [[v||, = (ev, v)!/?. This means that the system is con- 
tractive for t > tọ. In the initial phase £ € (0, tọ), the system 
does not have to be contractive, that is, the eigenvalues of 
the Jacobian may have positive real parts there. In this inter- 
val, we may choose a step-by-step method with sufficiently 
small step sizes, in order to preserve stability and to follow 
the transients of the solution. 


4.4.1 Boundary value techniques for initial value 
problems 


The traditional numerical integration methods to solve im- 
tial value problems 


= =f(t,u()), 1>0, u@)=up given (47) 


are step-by-step methods. Some such familiar methods are 
Runge—Kutta and linear multistep methods. In step-by-step 
methods, the error at the current time-step depends on the 


local error at this step and also on the errors at all previous 
time-steps. In this way, the errors accumulate and the total 
error is typically larger by a factor O(k~) than the local 
errors. A direct global error control cannot be justified since 
the global error depends on the stability of the problem and 
the errors at the previous time-points. 

In boundary value methods, on the other hand, all approx- 
imations at all time-points are computed simultaneously 
and such methods may be coined global integration meth- 
ods. By its very nature, a boundary value method is better 
adapted for global error control. 

For problems where one envisions that the solution sud- 
denly may become unsmooth, one can implement boundary 
value methods as time-stepping methods but with much 
larger time-steps than for standard step-by-step methods. 

A boundary value method can be composed of a sequence 
of forward step-by-step methods followed by a stabiliz- 
ing backward-step method. As a simple example, let up 
be given and 


utl u"! kf (tp u")=0, n=1,2,...,.N—-1 
u — u~! — kf (ty, u”) = 0 
whose solution components ut, u?, ..., u must be com- 
puted simnitaneously. Such a method was analyzed in 
Axelsson and Verwer (1984). For a more recent presen- 
tation of boundary value and other methods, see Brugnano 
and Trigiante (1998). 

The Galerkin method. For the interval (t), T), the fol- 
lowing global Galerkin method can be used. The inter- 
val is divided into a number of subintervals (t;_,,4),i = 
1,2,...,N, where ty =T. The length of the intervals, 
ti —t;_,, may vary smoothly with some function h(¢;), 
but for ease of presentation, we assume that the inter- 
vals have the same length t; — f4 = h,i =1,2,...,N. 
Consider each interval as an element on which we place 
some extra nodal points, t; j, j = 0,1,..., p, such that 
tij = +§,h, where $; are the Lobatto quadrature points 
satisfying 0 = §ọ <§ <---<§, = 1 and E +8), = 1. 
Hence, the endpoints of the interval are always nodal points 
and (if p > 1) we also choose p — 1 disjoint nodal points 
in the interior of each element. 

To each nodal point, we associate a basis function 9; ;. 
The basis functions may be exponential or trigonometric 
functions, and may also be discontinuous, but here we 
consider the case that they are continuous and polynomial 
over each element. Basis functions corresponding to inte- 
rior nodes have support only in the element to which they 
belong, and those corresponding to endpoints have a sup- 
port over two adjacent elements (except those at £ and ty). 
The number of nodal points in each closed interval then 
equals the degree of the polynomial p plus one. 


2 * 
Let Sp be the subspace of test functions that are zero at 
to, that is, 


§,= span{d, j $=0,1,2,...0 1, P= 1, 2.00607) 


Let 


Tr aU 
a(U; V) = f G +F, U), v) dt, 


to 


U,V € [H to D” 


where H!(tọ, T) is the first-order Sobolev space of func- 
tions with square-integrable derivatives. To get an approx- 
imation U of the solution of (46), we take a vectorial test 
function V = oh, multiply the equation, and after integra- 
tion, we obtain 


bees. Lol dif Pe x 
atu; of) = f (T + F(t, U), a) dt = 0, 
ol 


de 48 
F< i 
ver fil av ey 
aŬ; of} = f a + F(t, U), a) dt =0, 
i 
i=0,1,...,N-1 (49) 


Gls og PS 
and at ty =T, we get 


n ww { av Pe 
aŬ; oio) = f (z + F(t.U), os) dt=0 (50) 
N-1 


We choose in turn O = Q; j€,» where e, is the rth coor- 


dinate vector. This defines the Galerkin approximation U 
o 
corresponding to Sp, where 
N-1 P 
F7 m 
Ü = Ulton t > 4d. 45 ER 


i=0 j=l 


that is, we have imposed the essential boundary conditions 
at tọ. Clearly, 


a(U;V)=0 VV e [H" (t, T)” 


From (48), we obtain 


x wmf (au dv 
aw v)-aŬ: v) = [ € ae ae 


ek 


+ (F(t, U) — F¢, Ù), | dt =0 


Finite Difference Methods 25 


r=1,2,...,m (S)) 


and similarly for (49) and (50). £ 

To estimate the Galerkin discretization error U-—U, 
we let U, € Sp be the interpolant to U on {igb j= 
0,1,...,N—1. From the representation U — U = U- 
U) + (U, — Ū) =n- 9, we see that 0 =U; — U €$} 
Assuming that the solution U is sufficiently smooth, from 
the interpolation error expansion in integral form, we obtain 
the usual Sobolev norm estimates for the interpolation error: 


T T 
f [U — U, IÊ dt < Cy Í Was dt 
fo 


2 F 2 
dt < C,h? [ 


to 


dv 


au _ a, 


>o dt (52) 
dt dt 


ptt 


a 
to 


Here, IU I2, = Je E22 LOU 31), (00/31) dt is the 


norm in the Sobolev space H?*' (ty, T). 


Theorem 14. LetU be the solution of (46), and conditions 


(F(t, U) — F@, V), U — V) = pU — vir’ 
|FG,U)- FOV) s CIU- Vi t>0 


be satisfied. Then the Galerkin solution U, in the space 
of piecewise polynomial continuous functions of degree p, 
defined by (48-50) satisfies 


i h—-0 


yu —Ty=o@) [heU io + Ua) > 
where v= 1 if p=1, 1> v> (1/2) if p =3,5,... and 
v = 0 if p is even, and 


T 
ivi? = SEV, V(T)) + f pÐIV EN? dr 


Proof. For a detailed proof, see the supplement of (Axels- 
son and Verwer, 1984). | 


We have assumed that F is Lipschitz-continuous, that is, 
\F(t,u) — F(t, vli Ss Cilu — vli for all u, v € R”. This is, 
however, a severe limitation as it is a two-sided bound. 

Difference schemes. In order to get a fully discretized 
scheme, one has to use numerical quadrature, which results 
in various difference schemes. We consider this only for 
p = 1. Then, 9; p = 6; are the usual hat functions and there 


: N 
are no interior nodes. With U = U (to)bo + Dini U0, 
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relations (48), (50) imply 


ag = fit) bo ex 
eU; — Ui) =2 F(t, Ubi + U9; 


h-i 


+ Tiss Oia); dt 


(Üy — Oy) = j. Fa, Uyati + Uyoy)oy dt 
HI (53) 


We call this the generalized midpoint rule difference 
scheme. Let F; = F(t, U;)|,_,- If we use numerical inte- 
gration by the trapezoidal rule, that is, 


a h 1 
f) Foia = Sth 1G + ROG) = FFF 
tot 


which is known to be of O(h?) accuracy, on the basis of 
(53), we may derive a more accurate difference scheme. 
For this purpose, let 


1 h\ 1 
F6) S z5- +F) + (: -4 + 3) zhi — F) 


hastar 


except that for the last formula in (53), we usc F(t) = 
(1/2)(Fy_1 + Fy): ty-1 < £ S ty. Then, 


fi h h 
fi Fndz Eat H+ BG AD 
h- 


h 
=g +20), i=12. N1 


and similarly 
ESI 


h 
Fo dt = 2(F yy +27) 


fi 


Hence, the generalized midpoint rule (53) takes the form 


e(Ui44 — Uii) = (F +4F, + Fiyi) 
i=1,2,...,N—1, (54) 
e(Uy — Uy) = (Fas + Fy) 
We notice that this is a combination of the Simpson and 
trapezoidal rules. 

For this combination, numerical tests in Axelsson and 
Verwer (1984) indicate very accurate results. It is found that 
already on a very coarse mesh (h = 1/4), the accuracy is 
high. For (du/dt) = ŝu and 8 < 0, the order of convergence 
seems to be ~ 3.5. 

The above method is related to certain types of implicit 
Runge-Kutta methods; see, for example, Axelsson (1964) 
and Butcher (1977). As such, they are A-stable. The global 
coupling preserves this stability. 


4.5 Computational aspects 


The 0-method, 6 # 1 and other unconditionally stable meth- 
ods are implicit, that is, they give rise to a linear system of 
algebraic equations to be solved at each time-step. For this 
purpose, one can use direct or iterative solution methods. 
Hereby, the associated cost is important, in particular as 
there will be a large number (O(k!)) of such systems to 
be solved. 


4.5.1 Iterative solution methods. The conjugate 
gradient method 


The matrix in the linear system, which arises in the 6- 
method and has the form J — (1 — 9)kA. If A is symmetric 
and negative definite, which is typical for parabolic prob- 
lems, then the system has a condition number 


ca tt = kb 
~ 1+ (1 = Oka 
where a = — max, à; and b = — min, à; and h; are the 


eigenvalues of A. Hence, the condition number is bounded 
by k < 1+ (1 -—®)kb, that is (if a <1), the condition 
number of A is typically multiplied by (1 — 6)k/a, which 
can significantly decrease its value. For difference methods 
for second-order operators £, it holds b = O(h~?), so the 
number of iterations using the standard unpreconditioned 
conjugate gradient method grows only as O(h'/?), if k = 
O(h). Hence, the arising systems can normally be solved 
with an acceptable expense, in particular if one, in addition, 
uses a proper preconditioning of the matrix. 

For stability reasons, for an explicit method, one must 
let k = O(h?). Hence, there are O(h!) more time-steps, 
so explicit methods are generally more costly except where 
there is a need to choose small time-steps of O(h?) due to 
approximation accuracy reasons. However, for k = O(n), 
this is balanced by the condition number k = O(1), that 
is, the number of iterations are bounded irrespective of h, 
which make implicit methods still preferable because of 
their robust stability. 

For problems with a small condition number, approx- 
imate inverse preconditioners can be quite accurate; see, 
for example, Axelsson (1994) for references to such meth- 
ods. This can make implicit methods behave essentially as 
explicit but with no time-step stability condition. An exam- 
ple is the Euler backward method, where the arising matrix, 
I — kA, can be approximated by (I — kA)~! + I + kA, the 
first term in the Neumann expansion. This is equivalent to 
the Euler forward method. Adding more terms increases the 
accuracy of this approximation if ||kAlj < 1. 

Frequently, it can be efficient to reuse the same precon- 
ditioner for several time-steps, whence its construction cost 
can be amortized. 


4.5.2 Periodic forcing functions 


For problems 
u, = Aut+f(@,), «EQ, t>0 


where the forcing function is periodic in time, that is, 
f(x, t) = fo)e, one can apply the Ansatz 


u(x, t) = v(x, De 
where u and v are complex-valued functions. We find then, 
vy, = Av — iov = f(x) 


Using the 9-method or any other implicit method, we must 
solve a system of the form 


(A + iol) +in) =a +ib 


where A = J — k(1 — 0)A, and Ẹ, n, a, b are real vectors. 
Multiplying by A — iol, we get 


(A? + oG + in) = Aa + wb + i(Ab — wa) 


L 1 1 
| + (24) |e =-z4a +b (55) 


and a similar system for ņ. Equation (55) can be solved 
efficiently by iteration using a preconditioning in the form 
E + (1/w) AJ’; see, for example, Axelsson and Kucherov 
(2000). Here, the condition number bound does not depend 
on (1/m)A. 


or 


4.5.3 Direct solution methods 


Direct solution methods can be an efficient alternative to 
implicit methods if one uses constant stepsizes, and the 
coefficients in the differential operator do not depend on 
time, implying that the matrix arising at each time-step is 
constant. It can hence be factored, for instance, in triangular 
factors, once and for all, and the arising cost at each time- 
step comes from solving the factored systems only. For 
some problems, a nested dissection ordering scheme can 
be an efficient alternative to standard methods. For further 
details; see, for example, Axelsson and Barker (1984) and 
George and Liu (1981). 

In general, however, the cost and demand of memory 
for direct solution methods grow more than linearly with 
increasing size of the system and they cannot be considered 
to be a viable choice for very large scale systems. 


Finite Difference Methods 27 


4.5.4 Alternating direction implicit methods 


The alternating direction implicit method (ADI); see, for 
example, Peaceman and Rachford (1955) and the similar 
fractional step method (Yanenko, 1971) can sometimes be 
used to solve difference equations for elliptic problems 
more efficiently than some standard methods can. However, 
they can also be seen as special time-splitting methods using 
a direct solution method for the arising systems, which is 
the aspect of the ADI methods to be dealt with here. 

For notational simplicity, consider a homogeneous 
equation 


u,+Au=0, t>0, u(0) = uo (56) 


where A = A, + Ay, is the sum of two positive definite 
operators and it is assumed that systems with J + aA;,i = 
1, 2,a > 0 can be solved more efficiently than systems with 
I + QA. This holds, for example, if A,u = —(8?u/8x”) and 
Aju = —(8?u/ðy?) since the one-dimensional problems 
I +A,, after discretization and standard ordering, become 
tridiagonal matrices that can be solved with an optimal 
order of computational complexity (proportional to the 
order of the systems). 

To derive the ADI methods, we utilize the Crank— 
Nicolson method for constant time-step k, which with 
u, = u(t,), t, = nk takes the form 


k k 
(1454) ear = (1-54) n=0,1,... (57) 


Here, we add (k/2)A, - (k/2)A, to both sides in (57) and 
rewrite it as 


(+ ÂDE + Duny = O — A) - Apu, 
+A, Auna u) (58) 


where Â; = (k/2)A;, i = 1, 2. Since for sufficiently smooth 
functions u, u,_, — ¥,, = Ok), the last term in (58) has 
the same order, O(k°), as the local truncation error of 
the Crank~Nicolson method. If we neglect this term, the 
resulting method becomes an alternating direction scheme 
of the form 


(+ ÂDE + Aduni = U — AU = Apup, 
n=0,1,... (59) 
where the two arising equations with matrices 7 + A | and 


I+ A, can be solved with an optimal order computational 
complexity at each time-step. 
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For the stability analysis of it, we let %, = (Z + A,)u, 
and rewrite it in the form 


Tay = +AU -ADU -ÂU + AQ), 


which is clearly a stable scheme, since ||(J + A) — 
All <1,i=1,2. 

Although the neglected term in (58) has the same order 
as the truncation error, it can still make an undesirably large 
perturbation on the discretization error. We therefore correct 
the time-stepping scheme by adding the term A, A,(u, - 
u„—1) to (59). If the solution is sufficiently smooth, there 
holds A; Az (u, — up-1) = Ay Azp — up) + OC), so 
we have fully corrected for the last term in (58), which 
was missing in (59). 

The resulting scheme is now a three-level method. To 
analyze its stability, we write it in the form 


Tat = U + ADU — ADU — Ad + Aya, 
+U 44)74,4,0 + iD @, —7,1), 
n=0,1,... (60) 


An elementary computation shows that 


T, =[U —24,)U — 2A,) + A, Api, 


— AAT 1, n=O, y ER 


where A, = (U +Ã ipi = 1,2. 

We assume now that A,, A, commute (however, a quite 
restrictive assumption; see Varga, 1962). Then, the matrices 
G = Â Â, and H = (I — 2A,)(I —2A)) + Â, Â, are sym- 
metric and, since A,, i = 1,2 are positive definite, their 
eigenvalues g, h are contained in the intervals (0, 1) and 
(—1, 2) respectively. An easy computation shows that the 
characteristic equation A? — hh + g = 0 has roots N; <1, 
which also implies the stability of the modified scheme 
(60). 


Remark 6. The given boundary conditions for the prob- 
Jem (56) are assumed to have been implemented in the 
matrix A and, through the splitting, in A, and A, also. The 
above ADI method is applicable for higher order, such as 
fourth order, space approxinzations as well, but in this case, 
the arising matrices 7 +aA,,i = 1, 2 are pentadiagonal or 
have even a higher order of structure. They can still be 
solved for with an optimal order of computational com- 
plexity. To get such a pentadiagonal structure for both the 
systems, an intermediate reordering of the unknowns must 
be applied for at least one of the systems. 


5 FINITE DIFFERENCE METHODS FOR 
HYPERBOLIC PROBLEMS 


In this section, we consider the numerical solution of scalar 
hyperbolic problems. There are two types of such equations 
of particular interest, namely, 


(i) the first-order hyperbolic equation, typically of the 
form 


au, +bu, =f, O<x<1, t>0 


(ii) the second-order hyperbolic equation, typically of the 
form 


Up = Au+ f, O<x<1, t>0 


The latter, called the wave equation, describes trans- 
portation of waves, for instance, in acoustics, optics, and 
electromagnetic field theory (Maxwells equation). 

The numerical solution of the first-order problem can 
be seen as a special case of the more general convec- 
tion—diffusion equation, and is discussed in Section 6. 

In the present section, the wave equation is analyzed. 
We first prove that solutions of wave equations have quite 
a different behavior from solutions of parabolic equations, 
in the respect that the energy of a pure wave equation 
is constant (assuming no damping term and zero source 
function f), that is, the system is conservative. Further- 
more, information (such as initial data) is transported with 
finite velocity. This is in contrast with a parabolic problem 
without sources, where the energy decays exponentially, 
and data at any point influences the solution at all inte- 
rior points, that is, information is transported with infinite 
speed. We then discuss the stability of the method of lines 
for the numerical solution of wave equations and derive the 
familiar Courant—Friedrichs—Lewy (CFL) condition for the 
stability of the fully discretized equation. 

Fourier expansion methods are limited to problems with 
constant coefficients. However, we choose in this section 
to illustrate their use on a fourth-order problem. 


5.1 The wave equation 


5.1.1 Derivation of the wave equation 


To illustrate a problem that leads to a wave equation, 
consider an elastic string, which oscillates in a, say, vertical 
plane. Since the string is elastic, it has no stiffness for 
bending movements, so the stress S$ at a point acts along a 
line that is tangent to the string curve u(x, t) at x for every t 
(see Figure 5). (Note that u is the transversal deflection, 
that is, the deflection in the direction (y), orthogonal to the 


u(x, ĵ sS 


a(x, ĝ 


i) 


x X+ AX 
Figure 5. Stresses acting on a part of an elastic string. 


direction x.) We assume that the movement in each point is 
vertical, that is, horizontal movements are neglected. If the 
mass density per unit length is p, then equating mass times 
acceleration with internal and external forces, and equating 
stresses in the transversal direction (Newton’s law), we 
obtain 


WAX Uy, = S(x + Ax, t)sina(x + Ax, t) 
— S(x, t) sina(x, t) + fAx 


or, using sin a(x, t) = ftana/(/1 + tan? a)] = [u,/C/1+ 
u2)], we obtain the equation 


xtAx 
u 
HAX uy, = | sti | + fax 


J1+u2 


where f is an outer force, acting in the positive y-direction. 
Dividing by Ax and letting Ax go to zero, we obtain 


Su, 
= (2) ay, 456 (61) 
Muy, s) f. 


This equation is nonlinear and has variable coefficients in 
general. If we assume that the amplitudes of the oscillations 
are small (u? < 1), and that u and S are constant, then we 
get the linearized problem with constant coefficients, 

1 f 


Uy i = 
cant xx os 


x 


t>0 (62) 


where c = (./S/2). 

If in addition, the Cauchy data, namely, the initial posi- 
tion u(x, 0) = up(x) and the initial velocity u,(x, 0) = 
u(x) are known for all x(—oo < x <0), we have a 
unique solution defined. In practice, of course, the string 
has a bounded length. If we assume that it is fixed at x = 0 
and x = 1, say u(0, t) = a, u(1,t) = 8, then we have an 
example of a mixed initial boundary value problem. 

Note that the second-order differential operator in (62) 
can be factorized into a product of two differential operators 
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of first order, 


1/a\? a 138 32 (: ə y a ) 
ZG) Ta \eat ax) \cðt əx 
This indicates that the solution consists of two waves 


transported with velocity c in the directions defined by the 
lines x — ct = constant and x + ct = constant, that is, 


u(x,t) = F(x — ct) + G(x + ct). 


Here, F and G are defined by the initial data. For a 
mixed initial boundary value problem there may also appear 
reflected waves at the boundaries. š 


5.1.2 Domain of dependence 


Consider the solution of the homogeneous problem u, = 
Cu, t>0, -a <x <a, where c>0, a> 1, with 
boundary and initial conditions u(—a, t) = u(a, tr) = 0, 
t > 0, u(x, 0) = u(x), u,(x, 0) = u, (x), for ~a < x <a 
and t < (a — |xj)/c. ; 

If up is twice continuously differentiable and u, is con- 
tinuously differentiable, the explicit solution takes the form 


x+et 
uy (s) ds 
“ct 
(63) 
Hence, the solution u(x, fp) at some point (xo: f9)(% S 
(a — |xo|)/c) is only a function of the values of ug and u, 
on the interval [xq — Cfg, Xo + Cto]. For the inhomogeneous 
problem, u,, = c7u,, + f, one finds that the solution is the 
sum of the homogeneous solution (63) and a particular 
solution, which is equal to (1/2 times) the integral of f 
over the triangle in Figure 6. This is called the domain 
of dependence for the point (xp, tọ), and is defined as the 
smallest set D(xp, to) for points in the (x, t) plane such that 
Ug, u4, f = 0 in D(xo, fo) implies u(xp, to) = O. It is hence 


1 1 
u(x,t) = zoe — ct) + ug(x + ct)] + x | 


At 


b 


xct x+ct 


Wave direction 
{velocity = o) 


Xolo Xo Xo+ Cto x 


Figure 6. Domain of dependence for the wave equation. 
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bounded by the two characteristic lines through (Xp, tp) and 
the x-axis. 


5.1.3 Conservation of energy 


We demonstrate the principle of conservation of energy 
for the string problem, where u(—a, t) = u(a, t) = 0 and 
a >> 1. We assume then that ug is a smooth function with 
compact support and that the length of its support interval 
is much smaller than a. The solution consists then of 
two smooth waves with initial shape of (1/2)up, traveling 
with velocity c im the negative and positive direction, 
respectively (see Figure 7). 
The energy has two sources, 


G) the kinetic energy = (1/2) {%, pu? dx, and the poten- 
tial energy, which latter is due to the stretching of the 
elastic string, that is, 

Gi) the potential energy = f$, S[(/1 + u2) — 1] dx. 


Note that u, is the velocity in the vertical direction. The 
total energy at time ¢ is then 


ros f [ims (y+ -1)| dx (64) 


Theorem 15. For the solution of the homogeneous prob- 
lem (61) (i.e, with f = 0) with u and S constant, the total 
energy is conservative, that is, E is constant. (In particular, 
the principle of minimal energy as for an elliptic problem is 
not valid!) 


Proof. We have 


ny d 
E(t) = f [nuu ae (vi + 2) | dx (65) 


“a 


By (61) (with f = 0), the first term equals 
f : kaf Ee, eee, eee 
Luu = u = 
CEN a CATE), 


a 
í u Suu 
=- Su,- dx + LZ (66) 
-a “SIFE | tes] 


x 


where we have used partial integration. 


y 1 
i guo- ct) 


Figure 7. Two smooth traveling waves ((1/2)uo(x — ct) and 
(1/2)uo(x + ct)) in the linearized model. 


The last term vanishes here because of the compact 
support of ug. (For simplicity, we consider only the case 
that t is small enough, so no reflected waves at the ‘walls’ 
at —a and a appear.) Hence, by differentiation, it follows 
from (66) that 


a a d 
oo =-fs$(/i+i) ax 


so, by (65), E’(t) =0. o 


When u? <1, (61) takes the form u, — cu, = 
(c?/S) f, t > 0. Consider now the linearized wave equation 
in a bounded domain © and in a higher-space dimension 
(where the factor c?/S is included in f), 


u= Aut ft, t>0, xeQeR? (67) 


with boundary conditions u(x, t) =0, x € dQ, t > 0 and 
initial conditions u(x, 0) = ug(x), u,(x, 0) = u, (x), x € Q. 
This equation is the linearized model for a vibrating elastic 
membrane, that is clamped along the boundary 3N with 
an initial displacement, defined by uy and velocity u, (see 
Chapter 5, Volume 2). 


5.1.4 Uniqueness of solutions 


To prove uniqueness of solution of (67), we note that this 
is equivalent to proving that the homogeneous problem 


Wy = C7 Au (68) 


with homogeneous boundary and zero initial data has only 
the trivial solution. However, Theorem 15 implies that the 
energy E(t) in (64) is constant, and because of the zero 
initial data E(0) = 0, that is, E(t) = 0. This shows that 
u, =0 and u, = 0, sou =0. 


Remark 7, Although the energy and velocity are con- 
stant, the shape of the two traveling waves in Figure 7 
may change. Only for the linearized model is the shape 
unchanged. For the nonlinear (correct) model, the wave 
fronts tend to sharpen. 


5.2 Difference approximation of the wave 
equation 
5.2.1 A second-order scheme 


For the time discretization of (67), we use a combination 
of the central difference approximation and a weighted 
average, 


u(x, t +k) — 2u(x, t) + u(x, t — k) = k?Lyu,, (x, t +k) 
+ (1 — 2y)u,, (x, t) + Yuy lx, t — o] (69) 


t = k, 2k,..., where y is a method parameter. In practice, 
we choose 0 < y < (1/2). For the approximation of the 
Laplacian operator (A), we use a difference approxima- 
tion (A = A,), as described in Section 3. Then, the fully 
discretized difference scheme takes the form 


{I - yek A, Mu, (X, t + k) — 2u, (x, t) +u,(x,t —k)] 
= êk Apup, t) + Ply f(a t +k) 
+0 2y fa Atyat- k], t=k, 2k... 
(70) 


where u, is the corresponding (difference) approximation 
of u. 


Remark 8. If we use the method of lines, that is, replace 
A in (67) by A,, then the resulting system is an ini- 
tial value problem for a second-order, linear, ordinary 
differential equation, which can be rewritten as a sys- 
tem of first-order equations and subsequently discretized 
using the Crank—Nicolson method, certain types of implicit 
Runge-Kutta methods or multistep methods. Some details 
of analysis of such methods are presented in Section 4; see, 
Dahlquist (1978) and Hairer (1979) for further analysis. 
Naturally, after using the weighted difference approxima- 
tion (69), we get the same difference scheme (70) as before. 


Formula (70) describes a step-by-step method, which 
works on three time levels, t +k, t, t—k, at each step 
(a two-step method). Since (70) is a three-level scheme, we 
need starting values on the first two levels, t = 0 andt = k. 
For t = 0, we let u, (x, 0) = ug (x). For u, (x, k), we may let 


uy (X, k) = u(x, 0) + ku, (x, 0) = ug(x) + ku, (x) 
or the higher-order approximation, 


uy, (x, k) = u(x, 0) + ku, (x, 0) + 4k’°u, (x, 0) 
= Ug (X) + ku, (x, 0) + $k? [c7 Aug (x) + f(x. 0)] 


where we have used the differential equation and where we 
assume that uo is twice continuously differentiable. 

For y #0, (70) is an implicit difference scheme, and 
at every time level, we must solve a linear system with 
matrix J+ yc7k?A,, where A, is the difference matrix 


` Corresponding to (—A,). 


The computation of u,(x,t-+k) is performed in two 
steps: 


(i) Solve £, (x, t +) from 


(I + yc?k7A,)&,(% t +k) = rhs. of (70) 
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Gi) Compute 
u,(X, t +k) = 2u, (x,t) — u, (x, t — k) + 5(x,t +k) 


For y =0, on the other hand, we get the explicit 
scheme 


u,%t+k) = 2u, (x,t) ~ u, t — k) + PKR A, 
Xu t) HKF, t), t=k,2k,... (71) 


5.2.2 Stability analysis in Lz -norm 


Next, we analyze the stability of the completely discretized 
scheme (70). Clearly, we let u, satisfy the given boundary 
conditions, that is, u,(x) = 0, x € @2,. Being a step-by- 
step method, perturbations of say initial data (70) could 
be unboundedly amplified as t —> oo. To analyze this, we 
consider the homogeneous part of (70) (i.e. with f = 0). 
Let i,, v,(x), x € Q, be the eigensolutions of (—A,,), that 
is, 


—A,v;(%) = HY), XER, 1=1,2,...,N 


where v(x) =0 on @Q,. As we know from Section 3, 
> 0. Uf Q = [0, 1], then, in fact, Xm = (1/4322 — 
cosinh —cosmmh),1,m=1,2,...,n, where Q, is a (n + 
2) x (n + 2) mesh.) To find a solution of (70) (with f = 0), 
we consider the ‘Ansatz’, 


u, (x, t) = (M) En, t= 0,k, 2k, ... (72) 
where we assume that u(x) = v;(x), that is, it is an 
eigenfunction for some i, 1 <i < N. Then, by substituting 


this Ansatz in (70) (with f = 0), we get 


A YPK? — 2p, + Dui? v(x) 
= CR wy), tk, 2k... 


Hence, 
Be — 2p; +1 = yt (73) 
where 
Pk N 
S E VE 4 
t 1+ yerkn, (74) 
so 


Wig = 1-454 v (3x) =T 


Theorem 16. The homogeneous difference scheme (70) 
(with f = 0) is stable if t; = c7k?¥,/(L + yc?k? i) < 44+ 
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O(k?). Perturbations of initial data are not amplified if 
Tq; < 4, that is, ify > (1/4), or if 


1 
1—4y max; h; 


(ck)? < (75) 


where y < (1/4) and they are boundedly amplified, uni- 
formly in t =k, 2k,..., if t; <4+ O). 


Proof. By definition, the linear difference scheme (70) 
(with f = 0) is stable if and only if the solution correspond- 
ing to any perturbation of the initial values is bounded for 
all t > 0. Since any perturbation of the initial values can 
be written as a linear combination of the eigenfunctions 
v, (x), x € Q, (which form the complete space), it suffices 
to consider an arbitrary perturbation uy(x) = v;(x) and 
up (x, k) = pj (t/k)up(x), i = 1,2,...,N. From (72) and 
(73), it then follows that the corresponding solutions are 
bounded if and only if |p;| < 1 + 5k, for some ¢ > 0 (the 
von Neumann stability condition). This is so, because then 
Ip; l < e Yt > 0. 

The product of the two roots of (73) is equal to 1. 
Hence, we see that |u;| = 1 (i.e. the perturbations are not 
amplified) if and only if ((1/2)t,)? <1,, that is, if and 
only if t; < 4. By (74), this means that (1/4)ckd; < 1+ 
yek? = ,, that is, either y > (1/4) or (ck)?d, < [4/(1 — 
4y)]. It is valid for any perturbation, that is, for any i, if and 
only if (75) holds. Further, it follows that |w; < 1 + 5k, if 
and only if [(1/2)t,? — t; < (tk)*[1+ O()], k > 0. 


Corollary 4. Jf y < (1/4), then we have to choose k small 
enough, as follows by (75). In particular, if y =Q (the 
explicit method (71)), then ck < 2/ max, ree Note that 
max, 4; = O(h~”). (For the unit square problem, max, h; < 
8h72.) Hence, k < (2/c)O(h), (k < h/(./2)c, for the unit 
square.) This is a much more reasonable condition than for 
the explicit (Euler) method for the heat equation, where the 
stability conditions was k < O(h?) (k < (1/4)h? for the unit 
square). In fact, as shown later in this section, to balance 
the time discretization and space discretization errors, we 
shall normally choose k = O (h) anyway. 


If y > (1/4), then the method (70) is unconditionally 
stable, that is, stable for any choice of k. 


5.2.3 Discretization error estimates 


To find the order of the discretization errors u — u, as 
h(k) —> 0, we begin by considering the truncation error. 
(This is done here for a two-dimensional problem. For other 
space dimensions, one gets corresponding results.) 


From (70), after division by k?, we obtain the difference 
equation 
Lp in = U — y?k A] 


53 uy, (X, t +k) — 2u, (x, t) + u, (x,t — k 
k? — c? Ayu, (x, t) 


) -Ft 


where F,r) = yf ttk) + (1-2) fd) +f, 
t—k). 
Definition 4. The truncation error for (70) is L,,u— 


Jx, t) = Ly, lt — uh). 


Applying Taylor expansions for the central difference 
approximations, and assuming that data and the solution are 
sufficiently smooth, the truncation error takes the following 
form, 


Lp klU — up) = [I — yek An] 


x u(x, t + k) — 2u(x, t) + u(x, t — k) 
k? — c? A u(x, t) 


-ifa tth tA- 2y Sa) yf a t- k] 


2 i? 
=[I- yekan unt, Dtp D+ ow] 


~ 7 A,u(x, t) — f, t) — YK? fa, t) + OR) 
= up (x, t) — c?Au(x, t) — f(x, t) 


k2 
+ pe (% t) — VK (Au), — YK? fie 1) 


272 
-H uO 4 uO) + O04 + ORK + OHH 
2 ve + Hy 


By use of the differential equation u,, = c?Au + f, we get 
ul = Au, + f,, and, therefore, 


1 h? 
Lre — uy) = (z c v) Rul (x, h= a 


x UP +u) + OR) + ORK) + Oht) (76) 


Theorem 17. Assume that u is sufficiently smooth and 
that the difference scheme (70) is stable, that is, y > (1/4) 
or (ck)? < [4/(1 — 4y)](max i,)~!. Then, the discretization 
error of (70) satisfies 


(a) |lu—u,|| < CTR? +h), h,k > 0,0<1< 
(b) lu-u, < CT(kt+h?), hk—>0 0< 
if y = (1/12). 


Proof. Let e, =u — u, be the discretization error. Then, 
using estimates similar to those in Theorem 3, using the 


T 
ISE 


i 
$ 
$ 


i 
I 
| 
a 
E 
Ë 
i 
f 


property that J — yc*k?A,, is of strongly positive type, one 
obtains 


lu = ua) OI SCTIL, GU ulh O<t<T 


for some constant C (independent of h, k, T). Now, (76) 
readily implies the theorem. 


Remark 9. Ify # (1/12), we see that to balance the order 
of the errors in space and time, we have to choose k = 
O(h), h — 0. In the case y = (1/12); however, we may 
choose k = O(h'/?) to balance the errors. Unfortunately, 
with such a choice the stability condition (75) is violated if 
k(h) is small enough, 


One can show that if we use a nine-point discretization 
of A (in RÊ), and if we use proper weighted averages of 
JE, t +k), ft) and f@,t—k) at X = (x + h), x and 
(x — h), then we get a scheme with error O(k*) + O(h4) 
for y = (1/12). The corresponding matrix is less sparse, 
however, than for the five-point discretization. 


5.2.4 Computational aspects 


The linear systems of equations arising in (70) for y £0 
and when k = O(h) have condition number O(1), k > 0. 
Therefore, their inverses can be approximated accurately 
by a sparse matrix and correspondingly a preconditioned 
iterative solution method will work essentially as an explicit 
method. 


5.2.5 The CFL condition 


Consider the explicit difference scheme (71), or 
Dj D7 ty p(X, t) = ° D} D7 uy p(x, t) 


for the numerical solution of u,, = c7u,,, —00 < X < O0. 
u(x, 0) = ug(x), u,(x, 0) = u(x), co < x < 00, 

It can be represented by a stencil shown in Figure 8, 
where p = ck/h. The domain of dependence (a triangle) 
of the solution u(x,t) at a point (x, ft) is defined by the 
interval [x — ct, x + ct] and (x, 1). As follows from (63), 
if uo is changed at the endpoints or u} is changed in some 
point of this segment, then, in general, u(x, t) is changed. 
It is readily established that the solution of the difference 
equation also has a finite domain of dependence, that is, a 
smallest set D, (x,t) such that u, (x,t) = 0 if up, f =0 
in D(x, t), and defined by the interval [x — mh, x + mh], 
if t = mk. 

Hence, for convergence, (u — u,)(x, t), h — 0, we real- 
ize that this interval must contain the interval for the 
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Figure 8. Difference stencil. 


continuous domain of dependence, that is, 
mh > ct = cmk = m./ph 


Hence, we must have p < 1. (Note that this is in fact 
also sufficient for stability and hence also for convergence 
because the scheme is consistent — see (75) with y = 0, 
where max à; = 4/ h? — for a problem in one-space dimen- 
sion.) This condition is an example of a CFL condition: 
A necessary condition for the convergence of a difference 
approximation of a wave equation is that the discrete solu- 
tion at any point has a domain of dependence that (at least 
for small values of k) covers the domain of dependence for 
the exact solution at the same point. Equivalently, we can 
state the CFL condition: The velocity (h/k) of the solutions 
of the discrete problem must be at least as big as the velocity 
c of the solution of the continuous problem. 

Note that if an implicit method (y > 0) is used, the 
domain of dependence of the difference method is the whole 
x-axis, because the inverse of the operator I — yc?k?A, 
is a full matrix. Hence, the above necessary condition is 
automatically satisfied. 


5.2.6 Numerical dispersion 


Related to the CFL condition is the numerical dispersion 
number d. We study this for the one-dimensional equation. 
If d = (w/c|£|) = 1, the harmonic wave u(x, t) = eo!) 
satisfies the homogeneous equation u,, = c’u,,. This rela- 
tion between the frequency œw and the wave number must 
hold since all waves propagate with the same speed c. 

For the explicit numerical scheme (70), with y = 0, there 
holds 


u, (x,t +k) — Qu(x, t) + u(x,t — k) 
= o*[u(x +h, t) — Qu(x,1) + ul — h, t)] 
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with p = ck/h. Let the wave number £ be fixed. The Ansatz 
up (x, t) = eè) shows the relation 


eloak =J eTiok = p(t PDAS eh) 


that is, 
ok kN? th aha 
(a -1 ) =p (eT -e 2) 
or 
. Ok . th 
sin —— = tpsin — TT 
in = psin = (77) 


It can be seen that w, = cé{1 — [(£h)?/24](1 — 9”) + 
O[(£h)*]}. Hence, unless p? = 1, for the numerical solution 
there is a phase error  — œ, and the angular frequency w; 
is only approximately proportional to the wave number, 
This means that the numerical solution of a wave package 
containing several different spatial frequencies will change 
shape as it propagates. The phenomenon of waves of 
differential frequencies traveling with different speed is 
called dispersion. 

The number d, = (c,,/c|£|) shows how many grid cells 
the true solution propagates in one time-step. If d, = 1, 
the spatial and temporal difference approximation can- 
cel, and the signals propagate one cell per time-step, in 
either direction. If d, < 1, on the other hand, the numerical 
dissipation can differ from the analytical by a signifi- 
cant amount, at least for the bigger wave numbers. If d, 
> 1, the relation (77) yields complex angular frequencies 
for wave numbers such that | sin £2/2| > 1/{p|. Therefore, 
some waves will be amplified exponentially in time, that 
is, the algorithm is unstable. This is in agreement with the 
violation of the stability conditions in Theorem 16. When 
ck > h, the signal of the true solution propagates more than 
one cell per time-step, which later is the propagation speed 
of the numerical solution of the explicit scheme. As we have 
seen, the CFL condition is then violated. Similar results 
hold for other explicit schemes. 


5.3 A fourth-order problem 
In this section, we analyze the following partial differential 


equation, which is a nonstationary model for the deflection 
of a thin plate, fixed at the boundary, 


3?u z 2 
go TEAMA., t>0, @yEACR 
ðu 
=— =0, t>0, »y) EIQ 
u an " > (x,y) 


or 
u=Au=0, t>0, (yy) €dQ (78) 


and with initial deflection and velocity of deflection, 


a 
u(x, y, 0) = u(x, y), = y, 0) = uy (x, y), 
@yeEQ 


where u = u(x, y, t) is the deflection, (3/ðn) is the normal 
derivative, P = P(x, y, t) is a pressure load, E > 0 is the 
stiffness coefficient, and A? is the biharmonic (fourth-order) 
operator, 


su- (Ë a ey atu 8u | te 
~ Vax? ay?) “~ ax4 ° “ax2ay? ðy* 

(79) 
The equation appears, for instance, in the modeling of the 
deflection (vibration) of spinning disks, such as ‘floppy’ 
disks, tapes, and thin beams; see, Benson and Bogy (1978) 
and Lamb and Southwell (1921) (see Chapter 5, Vol- 
ume 2). 

We first describe the behavior of solutions of (78) by 
use of a Fourier expansion. For the numerical solution, we 
use the method of lines, however, applied on a system of 
two equations, where each equation contains a Laplacian 
(second-order equation) instead of the biharmonic opera- 
tor. By discretizations of the Laplacian operators we get 
a system of two coupled ordinary differential equations, 
which in their turn are discretized by the use of differ- 
ence approximations. In the interest of brevity and clarity 
of exposition, we present the above analysis for a model 
problem in one space dimension. (Note also that we con- 
sider here the boundary condition of type (8?u/ðx?) = 0 
instead of (0u/dx) = 0.) 

Consider therefore 

2 4 
span ga +P, O<x<t, t>0 (80) 


a 
a 


2 2 
i Nout pat, Oar ee Ha eSO 
ax? x? 


a 
u(x, 0) = go(x), Ta, O=), O<x<1 (81) 


(where, for simplicity, we assume that the compatibility 
conditions g9(0) = go(1) = 0, g,(0) = g,(1) = 0 hold and 
that g} € C*(0, 1), g, € C200, 1). 


5.3.1 Fourier expansion 


To find the behavior of the solutions of (80), we use the 
method of superposition and expansion of the solution 
in series of eigenfunctions (Fourier expansion). Consider 
first the homogeneous equation and the Ansatz u(x, t) = 
p(t)y (x). We substitute into (80) (with P = 0) and obtain 


pya) = pP (x) 
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"Oo E MEOS OD 


or 


which must be constant. We let > 0. (As shown below, 
the negative sign of (~A?) tums out to be the correct 
choice.) Hence, 


60) =Ge™ +Ge™ =e cosit tcp sin 


Similarly, we find that w(x) = C, cos i 
a ee es a a COS(./d)x + C; sin(/d) 

From u(0,t) =0 and (8?u/dx?)(0,t) = 0, (82u/ax?) 
(1,1) = 0, we get C, = C, = 0, and from u(0, £) = 0, we 
get then C4 = 0, and from u(1, t) = 0, W(x) = C; sin(,/r) 
x, (JN) = (Vr) = kr, k=1,2,.... Notice that this 
function also satisfies yO (0) = Y (1) =0. 

Hence, u(x) = (c; cos gf + cz sin Agt) sin(k1x) satisfies 
the homogeneous equation and the homogeneous boundary 
conditions. By a proper choice of the Fourier coefficients 
ce? 3 c®, we find that 


o 
u(x,t) = SEP cosk?n7t + CP sink?n?t) sin(krx) 
k=l 


also fulfills the two initial conditions, that is, u is a solution 
of (80) and (81) with P = 0. (If we expand P in such a 
Fourier series, we may also find a particular solution sat- 
isfying the inhomogeneous differential equation, but with 
homogeneous boundary and initial data.) Our conclusion is 
that the solution of (80), (81) consists of harmonic oscilla- 
tions, both in time and space. The problem is of hyperbolic 


type. 


5.3.2 The method of lines 


Consider now the numerical solution of (80), (81) (for 
simplicity with P =0). We rewrite the equation as a 
coupled system of two equations, each of second order only 
in space: 


ðu av 
ar Ox? 
av au 
ap age? O<x<l1, t>0 


with initial values u(x, 0) = go(x), v(x, 0) = g(x), where 
(du/dt)(x, 0) = vf = gx), O<x <1. 
k Note that we can compute vy from vj = g\(x) by 
integration. 

The boundary conditions are u(0,t) = u(l, £) = 
v(0, £) = v(1, t) = 0, t > 0. The latter follows from v, = 


—u,, = 0 for x =0 and x = 1 Yf > 0. Hence, v is con- 
stant for x = 0 and x = 1 for all t > 0, and we may put 
this constant to zero. 


We approximate now (87v/dx") and (87u/8x?) by cen- 
tral differences and we then get 


ar ee b 0 DID] u, 

dt | vt) | -DD 0 v, |" 
t>0, x=h,2h,...,1-h (82) 

where h=1/(N +1). Let U,, V, be the vectors with 


components u, (x, t), v,(x, £) respectively, at x = x, = kh, 
k=1,..., N. Then (82) may be rewritten in matrix form 


d fU 2| 0 BJU 
vk e e e 


where 


2 -1 0 
B= -1 i -1 
0 -1 2 


and the ones of $ A are accordingly readily found 
to be 


; to EY 
hy = th, = 4i(2sin TE) , i=(-1)'? (84 


(Lagrange (1759), Thése ‘Propagation of sounds’). 

Hence, since p, is purely imaginary and the eigenvec- 
tor space is complete, the solutions u,, v, of (82) are also 
bounded and are represented by wave (trigonometric) func- 
tions in time. Accordingly, the behavior of the method of 
lines describes the solution of (80) correctly, at least in a 
qualitative way. 


5.3.3 Time discretization 


To get a completely discretized equation, we approximate 
the time derivative in (82) by central differences, using the 
midpoint method. Hence, let 


du, (x, t) I 
a ae od pn t+k)—u,(x,t —k)] 


PER DK ooi 
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We get then 
[yee tO] = [ee] 2k | 0 B 
Vvet] LV,¢-4© h| -B 0 
G0 ct 
«eto | t=k,2k... (85) 


This is an explicit step-by-step method, working on three 
levels at a time (a two-step method). 
As initial values, we use 


PAREI 

V,,(0) Volz) J’ 

Releli] 
V, (k) v9 (x) ~gg(*) 


1 fn (x) 
g2 | Eo | 
ae —31 (x) 


Equation (85) is a homogeneous difference equation of 
second order. To analyze its stability, we make the Ansatz 
r™W,,m =0,1,...(m = t/k) for the solution, where W, 
x a that is, AW, = 
bug Wg. We then get rW, = (r"! — 2rr")W p m= 
1, 2,... where 


is an eigenvector of A= 


k 


Tq = Fala (86) 


Hence, r must solve the characteristic equation r? = 1 — 
214. 

Let r1, r2 be its solutions. Note that [r,r)| = 1 andr, , = 
t, + (1 — jt,l?)"?. Since t, is purely imaginary, we have 
max;..1,2 7;| < 1 if and only if |t,| < 1, that is, by (86) and 
(84), k(2 sin(gmh/2))* < h?. In fact, we then have |r} = 1. 
This must be satisfied for all q =1,2,...,N. Hence, 
k <h?/(2sin(N1/2(N + 1)))?, or k < (1/4)A?. (Courant, 
Friedrichs, Lewy, 1928). 

The latter condition on k is a severe restriction on the 
time-step length. To avoid this, one can use a proper, uncon- 
ditionally stable form of an implicit method for (83). For 
instance, the 6-method is applicable, 0 < 0 < (1/2), and for 
8 = (1/2), we have a conservative scheme, that is, the cor- 
responding eigenvalues p q are purely imaginary. To solve 
evolution equations with eigenvalues that can be purely 
imaginary, one needs, in general, A-stable methods, that is, 
methods that, when applied to the model problem u, = Xu, 
are stable for all > with Red < 0. 


6 CONVECTION-DIFFUSION 
PROBLEMS 


6.1 The convection—diffusion equation 
Consider the convection-diffusion problem 


Lu=—V-(eVu)+v-Vutcus f, 
xE€Q,u=0ondQ (87) 


where Q is a bounded domain in R” with, for example, 
Dirichlet boundary condition u = g, x € Q. We assume 
that e > 0, c > 0 and that v is a given velocity vector 
function, defined in Q. Typically, v carries (convects) some 
relative concentration (mixing ratio) of a dilute chemically 
active solute in a neutral solvent (fluid), which moves in 
a closed vessel with the imposed mass flow field v; f 
is the chemical source term and c is the rate coefficient 
for removal in chemical reactions (see Chapter 24, this 
Volume and Chapter 7, Volume 3). 

When [lvli = {v? + v3}!/ is large (the problem is then 
said to have a large Reynolds number), the convective or 
hyperbolic part vu, + vyu, of the equation dominates and 
the solution follows essentially the characteristic lines, On 
the other hand, when |lv|| is not large, the diffusive part 
—Au dominates the behavior of the solution. 


6.2 Finite differences for the 
convection—diffusion equation 


Consider first a one-dimensional problem 
—u,, HVU, =f, 0<x <1, u(0) =a, u(1)=ß (88) 


Using central differences for both u,, and u, results 
in a difference operator L,, which is not of positive type 
(see Section 3) if the so-called local Peclet number Pe = 
(vh/2) > 1. Then, the discrete maximum principle does not 
hold for L,,. 

When v is very large, satisfying the condition Pe < 1 
would require a very fine mesh and, hence, a very large 
cost in solving the corresponding algebraic system. To get 
a positive-type scheme, one can use the so-called upwind 
(backward) differences for u,, that is, 


u(x;) a u(x) 
h 


u) Kp a 


u(x) = if v>0 


and 


ifv <0 (89) 


However, the leading local truncation error term now 
becomes ~—v(h/2)u,,, which can be seen to cause extra 


3 
4 
$ 
£ 
i 


(artificial) diffusion, and has an approximation error that is 
of one order less than for the central difference scheme, 
When v is constant, f = 0 and a = 0, $ = 1, the solution 
of (88) equals 
ere-D se 
AoE 


Hence, when v is large, then u(x) * 0, except in a thin 
layer of width O(1/v) near the boundary point, where the 
solution increases rapidly. The central difference scheme, 
Tegs care De 2 
pai- + 20; — bigs) + v5, Fi —u;_,) =0 


isl 2isewn 


vh\ n a vh\ — 
- (14D) a 42m ~(1- FP) ia =0 


has a solution #;, which, if (vh/2) # 1, satisfies 


or 


i, = Cyr, + C24 


where 2, > are the roots of the equation —[1 + (vh/2)]+ 
2) — [1 — (vh/2)}? = 0. Hence, X=1 and Mm =[1+ 
(vh/2)]/U1 — (vh/2)]. Using the given boundary values, 
one finds 


AR (1y +1 =< syrritl = (pti ee gyri 
bey L=(-1)"F. — 84 


with 8 = 2/[1 + (vh/2)}. This exhibits an oscillatory be- 
havior, f, = 1, P œ -—1 +8, %,)=(d- 8)?, and so 
on. On the other hand, considering only each second point, 
i=n+i,n—1,n—3,..., the solution takes the form 


A (-pytla = è) 


Bo _ syr-itl 
T, = (L — 8)" I (bp — seri 


that is, exhibits no oscillations. Similarly, for the upwind 
scheme, the solution turns out to be 


~ G+vhř-1 
Mi= A pone =T 


which shows that the solution is smeared out, that is, the 
width of the numerical layer is much bigger than that for 
u, typically O(1/vh) instead of O(1/v) when v > 1. 

Frequently, it can be efficient to use central differences 
everywhere, except in the layer region, where the solution 
can be resolved using an adapted mesh with h < 2/v. 

Alternatively, as shown in Section 6.6, one can use 
a generalized difference scheme based on local Green’s 
functions. 

As for the one-dimensional problem, for higher- 
dimensional problems, we construct a mesh Q, in Q and 


O<i<nt+l 
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consider two approximations for £ at every point (x, y) € 
Qp 


(i) central differences (of Shortley-Weller-type next to a 
curved boundary) for —Au, u, and Uy. The corre- 
sponding operator is denoted by Lo. 

(ii) Central differences for ~Au but “upwind” differ- 
ences for u, and Uy, that is, 


ðu _ u(x, y) ~ u(x —h, y) 
z” y) x Du = h , 
if v(x, y) > 0 


ðu u(x +h, y) — u(x, y) 
ance) = Du oy oo he aan 
if v(x, y) < 0 


and similarly for u,. The corresponding difference 
operator is denoted by pee. 


The first discretization method gives local truncation 
errors O(h*) (except possibly near a curved boundary), 
while the second difference method has local errors O(h?) 
at best, and one can expect that the first method gives 
more accurate solutions for sufficiently small values of 
h. However, when |lvi] is large, the first scheme is not 
of positive type and the difference matrix may not be 
monotone. This causes numerical solutions with oscillatory 
behavior when the solution of (87) has steep gradients in 
parts of the domain ©. On the other hand, the second 
scheme always gives a difference approximation of positive 
type and such an operator can not cause such nonphysical 
wiggles. However, the first-order approximation of u, and 


Uy causes truncation error terms 


k? h? 
ga = ag” 


which, as can be seen, corresponds to added diffusion terms. 
This means that the numerical solution is smeared out, 
where there occurs sharp gradients in the solution of (87). 
Hence, both schemes have advantages and disadvantages. 
As it turns out, the first scheme works well when we use 
(locally) a sufficiently fine difference mesh where steep 
gradients occur but not necessarily in other parts of the 
domain. 
A non-self-adjoint problem 


n n 
Lu=-} aly, t} Vuy tou=f (90) 
i=] i=] 


where a; and v, are constants and a; >0, can be 
transformed into a self-adjoint form. Namely, with 
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u = w exp[(1/2) }.,(v;/a;)x,], we find the transformed 
operator equation 


a n i n v2 
Zw =~ Sain + (e3 ac 


i=l 
= f exp as Hi (91) 
2 a; | 


Similarly, if c is also constant, for a uniform mesh, the 
central difference matrix for (90) takes the form 


A bl 0 
ol A bl 
A, = Cc, A 
0 c A 


If v;h/(2a,;) < 1, then the scheme is of positive type and, 
as shown in Axelsson (1994), the matrix A, can be trans- 
formed by a diagonal transformation with the block diago- 
nal matrix D = diag(d,D), d, = 1, diyy = (/c¢;/b))d,,i = 
1,..., n — 1, so D~!A,D becomes symmetric when with a 
similarity transformation D~! AD is made symmetric. Here, 
the matrix D has a similar form as D but with A, J replaced 
by scalars. However, when a, < v,, the transformation (91) 
is not recommended as the problem becomes singularly per- 
turbed and the solution has strong boundary layers. 


6.3 Discretization errors 


The technique in Section 3.4 can also be applied for certain 
discretizations of convection—diffusion problems. 

Using a proper weight function, it is shown below that 
the discretization error near Dirichlet boundaries can have a 
higher order of accuracy than in the interior of the domain. 
This result is particularly useful in a boundary layer, where 
the solution is less smooth as its derivatives increase as 
some power of the Reynolds number (|v|/e). The result 
shows that we can balance this growth better, with no need 
for choosing a much smaller mesh size than is required for 
stability reasons, 

The latter can be illustrated by a simple example. 


Example 2. Consider problem (87) with v = 1. We as- 
sume that h < 2e and use two slightly different discretiza- 
tions: 


(i) central difference approximations everywhere for both 
terms in (88); 

Gi) central difference approximations for the first term 
everywhere and for the second term in the interval 


(0, 1 — mh), but upwind differences for the latter term 
in the interval [1 — mh, 1). Here, m is a fixed natural 
number 1 < m <n. In particular, this is done for the 
last interior point. 


We show that in the first case the local discretization 
error has a higher order of approximation near Dirichlet 
boundaries, and, in the second case, the lower (first) order 
of approximation near x = 1 does not destroy the global 
(second) order of approximation. 

In both cases, the difference operator L, is of positive 
type and it is readily seen that there exists a barrier function 
to L}, whence L, is monotone. 

The corresponding matrix is tridiagonal, 


E 
A= pls —t;) 


where r; = 1 + h/2e,s,; = 2,t; =1—h/2e,i =1,2,...,n 
in case (i) and the same coefficients hold fori = 1,2,..., 
n—m but r; = 1+ h/s, s,=2+h/e, t =1 fori=n— 
m-+1,...,n in case (ii). Taking v = x as a barrier func- 
tion, we obtain (A,v),; = 1, except for the last point, where 


1+ = (z — 1) case (i) 
(A,v); = 2h \h 


1+ 7 case (ii) 


By the Barrier lemma it holds, therefore, that 
HAF lo < maxy = 1 (92) 


For the pointwise discretization error e, = u — up, there 
holds A,e, = Tp, where Tpx) = (L,u — f,)(x;) is the 
(pointwise) truncation error and 


=1 
lerla SAK lloolltalles £ Italo 


More accurate bounds near Dirichlet boundaries are derived 
next. For this purpose, we partition the matrix A, in two- 
Ay Aj 
21 
order m x m and corresponds to the m last meshpoints. 
Let the discretization and truncation errors be partitioned 
consistently with the matrix, e, = [e®, ePf and t, = 
Ici), 11". From Ape, = Tp, taking the inverse of the 
partitioned matrix, we obtain 


by-two block form, A, = . Here, A, has 


Shay ae = zhi 
i (Aq + Aq AnS Aa ARDT 
e - — 
lo] z +Aq ApS tp? (93) 
SA, Ape + S12) 


where S = Az) — Aj, Aj) Ap. The matrix Ap has the form 


h h 
An = aT- (1+2).24+2,-1) 


while S = A, except its first diagonal entry, which equals 
1 +8 for some positive 8,8 < 1 + A/e. 

Assuming that the solution is sufficiently regular, for the 
truncation errors, it holds 


1 I 
lP Woo == 2? [ell ha + alu? leo| 


k? h 
Wp leo = Tye lo + 5 IMP loo 04) 


To bound the norms Heh? Ilo» i= 1,2, we must estimate 
IAG lao and ISo: 

It follows from (92) that lan lə <1. To estimate 
AS! we let 8 =0, which will give an npper bound 
of the norm, 

The corresponding matrix § then takes the form 


1 -1 0 
~ ¢e | -l-o 2+0 -1 
eee Sy 

0 -l-o 2+0 


where o = h/e. The matrix 5 can be factorized as 


where 0 = 1 +o. Hence, 


foi w ee 1 0 
re Wad (oe @ 4 
=m £ va 
0 1} Len-} Ə 1 
and. 
1 2 ee 1 8-1 
Kan eos 1 P wee I e—1 
~ ¢ @—1 : 
0 1] -—1 


onti —9—m(09-— 1) 
ko i orti — 6? — (m — D- 1) 


(on -@-1 
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Hence, it can be seen that 


x hè č 1 
1S ho < IS o = Gps! e-me- D] 
2 
< E Gler —1-—(m+1)0] 
2 
x sim ye, o> 0 (95) 


Since m is fixed, it follows that the quantity (¢/h*)|S~!|,, 
is bounded uniformly in 4 and e. 

It follows from (93) that Se = a + Ay Aint. 
Here, Ay, Apt? = (1 +o) [Aq P), m: --, OFF and by 
(94) AQ AT tI < + ONP leo = OW). 

Therefore, 


le? too < ST ool? loo + A + oie o] 


In case (i) we have || ae loo = O(h”), so by (95) jf e lee = 
(1/e)O(h4). In case Gi), It IIo = O(h) and le? llao < 
(1/e)0 (h?). For ef, it follows from (93) that 


le too < OMEP Woo + ST Hoot Hoo) = Oh?) 


It is hence seen that for a fixed £, we have a higher order of 
convergence at meshpoints in the vicinity of the boundary 
where Dirichlet boundary conditions are imposed. 

In the above estimates, we have not included the e- 
dependence of the solution u and its derivatives. Since u 
has a layer at x = 1, it holds that ||u® iln = O(e~) there. 
Therefore, 


hte+ case (i) 


Gp TEE. 
Neh llo S pe + h3e73 xx (=) fı + ž] case (ii) 


To balance this growth, we must use a finer mesh in the 
layer. Letting Ag be the mesh size in the domain where the 
solution is smooth, we then let the mesh size h in the layer 
domain satisfy 


pyt 
(=) =h}, ie,h= hye, in case (i) 
(96) 


3 
(=) =h}, ie h= nee, in case (ii) 
Since the layer term decays exponentially like exp[—(1 — 
x)/e] away from the layer point, a geometrically varying 
mesh size can be used near this point. The estimates in 
(96) show that we need a smaller mesh size in case (ii) 
than in case (i). However, the upwind method has a more 
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robust behavior for more general problems than the central 
difference method, so it may still be preferred. 

As we have seen, the upwind method is equivalent to an 
artificial diffusion method with added diffusion of O(h). 
However, since h < £, or even h <& e in the layer domain, 
the resulting smearing of the front is negligible. 

Finally, we remark that the central difference method can 
be used in the domain where the solution is smooth even 
when h > 2e. The nonphysical oscillations that can arise, 
do so only if the solution is less smooth, such as in the 
layer. In practice, it suffices, therefore, to let k < 2e there, 
which is of practical importance when s is very small. 


Example 3. Consider now a two (or higher)-dimensional 
convection—diffusion problem 


Lu = —eAu +v: Vu+cu=f, XEQCR (97) 


where, for simplicity, u =0 on 3N. In general, 3N is 
curvilinear. Let Q be embedded into a rectangular domain 
Q0, where we introduce an n x m uniform rectilinear mesh 
Qf, assuming that n = O(h7!), m = O(h7!), By Q,, we 
denote the mesh subdomain corresponding to the original 
domain 9 and by {2}, the subset of N, for which there 
exists a complete difference stencil. For nodes P € Qi, we 
use a standard five-point second-order central difference 
approximation. For nodes in <2, \S2},, we use either a linear 
interpolation u,(P) = [hgu(W) + hyu(E)]/(hg + hy) or 
a Shortley—Weller difference approximation (21), and a 
corresponding approximation for the first-order derivative 
term, where the notations are explained in Figure 2. 

Alternatively, as in Example 2, one can use upwind dif- 
ferences for the first-order terms. Similarly, as in Example 
2, we can partition the matrix according to the nodeset 
Q; consisting of points near the Dirichlet boundary points, 
including the outflow boundary (where v,n, + vn, > 0 
and n,, n, are the components of the outward-pointing 
normal vector), After somewhat tedious computations in 
a similar way as in Example 2, one can show that the 
local discretization errors at meshpoints in Q% have one 
unit higher order of approximation for the case of a Short- 
ley—Weller approximation and the same order as the global 
error for the linear approximation. The global error is 
O(h?), 

The same higher-order approximation holds even if we 
use an upwind approximation at the outflow part of 3&2. 
Finally, similar considerations hold with respect to the s- 
dependence as in Example 2. 


6.4 Defect correction and adaptive refinement 


The previous error estimate results are inapplicable unless 
L, is a monotone operator. Furthermore, there can occur 


cases where the truncation error does not reveal the real 
order of the discretization error, as we have seen. 

In such cases, a defect-correction method may be useful. 
This method involves two discrete operators: EO), which 
is monotone but normally of lower (first) order, as cor- 
rection operator and a defect operator Bo. which may 
be nonmonotone but is of higher order. The method can 
be used repeatedly (p times to achieve a pth order of 
accuracy). 

The corrections to the solution, produced by the defect- 
correction method, can be used as estimates of the local 
error to indicate where a mesh refinement should be done. 
However, we show that the method also allows more accu- 
rate error estimates, which are useful in avoiding overrefine- 
ment of the mesh, The described mesh-refinement method 
is special as no slave nodes appear. 

Defect-correction methods have been presented in 
Axelsson and Layton (1990) and Axelsson and Nikolova 
(1998), among others. 

We illustrate the method here on the problem (97), 
where Q = [0, 1}, u = 8}, x € T}, n- Vu = g3, x € T\I}. 
Here, əN =T consists of axi-parallel pieces, allowing 
the use of orthogonal meshes Q,. Triangular domains 
with two axi-parallel edges can be handled in the same 
way. 

We assume that 0 < e < 1 (normally £ < 1), c > 0 and 
the functions f, g,, g, are sufficiently regular. Further, the 
velocity vector v is assumed to be sufficiently smooth, {vj is 
bounded and the outflow boundary T_ = {x € T; v- n < 0} 
is a subset of T}. 

For this problem, we choose the correction and the defect 
operators Eo and EY in the following way: 


- Lo is a difference operator of combined central dif- 
ference and upwind type, in which u, and u, are 
Cees by upwind differences. 

- is the second-order central difference operator in 
which Au, u, and u, are approximated by central 
differences. 


We assume that the local Peclet number Pe = 
(2e/h|v|) < 1; otherwise, the use of upwind discretizations 
makes no sense, 

The second-order defect-correction method includes the 
following steps: 


(a) Compute an initial approximation (uP) by solving 
LOM = f 
hh TH , i D 
(b) Compute a correction (8,) by solving Lt 3, =f, 7 
LE gD and set u, = u +,. 


Here, f, denotes the restriction of f and the boundary 
terms to Qy. 


| 


In theory (apart from practical implementation details), 
by performing a sufficient number of defect-correction 
steps, the defect-correction method can be extended to a pth 
order difference method. Each application of a correction 
step increases the order of the error by 1. However, for 
the considered problem, the error estimates involve powers 
of e~! and when e — 0, the above order does not show 
up unless the local stepsize h is sufficiently small in the 
subregions where the solution gradient is large. For this 
purpose, we couple the method with an adaptive refinement 
method where the mesh size is decreased locally, until 
some error estimator indicates that the discretization error 
is sufficiently small. 

To simplify the presentation, we assume that the differ- 
ence mesh consists of square elements, that is, the mesh size 
used at a meshpoint (x,, yj) satisfies (hi;), = (his)y = hij- 
The mesh size will be refined along an interface and on one 
side thereof. To be specific, assume that the mesh refine- 
ment takes place in a subdomain, as shown in Figure 9. We 
want to avoid the introduction of the so-called slave nodes, 
because in order to obtain the value of the solution in such 
a point, an interpolation method of higher order must be 
used (see Figure 9(a)). Following Axelsson and Nikolova 


(b) 


Figure 9. Interface points using (a) slave nodes (b) skewed 
stencils. 
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(1998), we introduce skew-oriented stencils at the interface 
points, which are not at the center of a cross-oriented stencil 
(see Figure 9(b)). 

The difference approximation in those points will then 
be based on the equation (97), transformed to a local 
skew-oriented coordinate system £, n, where § = [(,/2)/2] 
(x+y) and n =[(/2)/2)(x — y). Then, u, = [(/2)/2] 
(uy HUn), by = [6/2)/2](u; —u,), and uz, + Uyy = Meg 
+ Ung Thus, "equation (97) takes the form 


V2 
—€(Uge + Um) + z + V2)uy 


Jd 
+ z — v)u +cU = f 


The truncation error of the second-order defect-correction 
method is 


tp = LP 0 — uy) + LEP up? — p) 


0) 0 
= LP uu) + LPU — fy 


() 
= (LP — L)u — uP) + LP — Qu 

that is, it is a sum of a term arising from the defect- 
correction method and of the truncation error of the 
central difference approximation. For the latter term, we 
have 


(LY - Dui; l l 
(0) if (x;, yj) is a regular 
Th oy) point 


= fo i 
a Cay) > if (x, ¥)) is a skew 
( iy au ) stencil interface point 


aay + Axay? 
ax2dy x ay’ (98) 


where 


h? vu 3u 
x0 Gr y= L (vase hy "1 Bx dy? 7 


eh?, (atu atu 4 
- (Ste, + o(h},), ifu e C) 
Consider now the term (LP — L® Xu — ul). Let EE 
and Lo, i = 0, 1 denote the parts of the difference operator 
LP poreanbidiilg to the x- and the y-directions corre- 
spondingly. The operator (LO — LO) = aL- POY + 
(L® — LO) applied to an arbitrary smooth (at least 
twice differentiable) function g gives us the following 
expressions. 
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(1) Let (x;, yj) be a regular point. If v) = v (x,, yj) > 0, 
then 


D Og. ij ~ 8i-1j L Bis T Bij 
(Lix hx )8ij Vy ( hy 2hi; 


AA Vihij 8i- 28y H 8i 


2 hi, 
vihi; 3?u 
zep arily Fw 


where the last equality is obtained by Taylor expan- 
sion. The o(h; j) term stands for a quantity that decays 
faster than hj, as h;; > 0. If sufficient regularity 
holds, its exact expression is 


1/ f” 
a(S ir +98% sas 
Xi-t 


Xi+l 
k T Kii TDP; +5) as) 


Similar expressions hold for the other Taylor expan- 
sions used. 

If v, < 0, we obtain the same relation but with oppo- 
site signs. Summing up in both directions, we obtain 
that for all vj, v, 


hi; ð a 
(1) (0) ee 
(Li — Lr )8ij = -4 (mig + malga )su 
+olh;;) (99) 
(2) Let (x;,9;) be an interface point, that is, a skew- 


oriented stencil is used at this point. If (v + v); j> 0, 
then the truncation error in -direction is 


(Lig — LIDS 


£ v2, pup) [Eu _ Siti iy- 
ag ee 2./2h,, 
ja E E 
a -Ho $ ji 1j-1 Si; itlj+1 


2 
(V/2h);, 
hi, a a\? 
=-—(v, +r) (= + =) 8y + O(h;;) 


and similar expression holds when (v, + v2);; < Oand 
for the velocity direction v, — v,. Summing up both 
directions, for all v; + v, and v; ~ v,, we have 


h;; a ay 
1) (0) f 
(LP -LP = 73 E af +) 


@ ay 
+v — vl as By gj + 0(h,;) (100) 


If v; + v, > 0 and v} — v, > 0, the formula simplifies 
to 


h, 
(LO — LOs; eas -4 


ag 3?g 3?g 
; hi 
: E Ge J mt) 2a +e o) 


For all v,, v2, the operator (L® — L®) 8; is a product 
of h; pâ linear combination of the absolute values of 
the velocity components and second-order derivatives 
of g. Thus, the operator is of first-order accuracy if g 
is smooth. 


Lemma 7. Let u e C® (Q) and let u® be the discrete 
solution of EPa = fa. Then, there exist functions b and 
W, independent of h;;, such that 


(u = up Vax Yj) = hylan Y) + EWC, yj) + olh?) 
(101) 


Proof. We introduce the following notations for operators 
D,, i =2,3,4: 


au 82u 3u Bu 
Pau = Via + aaa Di = Waa tas 
dtu atu 


Dyu = Jri + a 
Let (x;, yj) be a regular point in Q, and v = [v,, v] > 
[0, 0]. Then, using a Taylor expansion, we get 
Bp h? 
(Pu = LPu) j= -Dyu + Dau 


ehi 2 
- Tp P +0(hj;) (102) 


where D,,i = 2,3,4 are also taken at the point (x,, y). 
Let p and y be the solutions of 


a i 
| Lọ =—3Dyu inQ (103) 
o=0 on 8Q 
and 
1 1 e a 
AVS 3 Dab + z Dau — q3 Dau in Q 
y=0 on dQ 


Then, by repeated use of (102), we obtain 


1) 1) 
LP (u — uf diy 


Ay hi; 5 y g 
2 Dyu + 6 Dyu 12 Dyu + o0(h;;) 
ij 
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1 e€ 
= hyLo,; + hi, (52s - gPa) + oth) 
i 


= hy Lo; +h Ly,, + ohh) 
= hy Ld tHE Wy, + olh?) 


Since Lj” is monotone and its inverse is bounded uniformly 
in hi and s, we find 


(u — WL I), Yj) = hyhi +H iy + 0(02) 


Using Lemma 7 and (96), we obtain now 
(Ly? = LRU — uP) 
= su (mig? + mizè) +o(h?, 
for a regular point (x;, y) and a similar expression based 


on (99) for an interface point. The results for the truncation 
error are collected in Theorem 18. 


Theorem 18. Let u be the solution of (97) and u, be 
its discrete solution obtained by applying the second-order 
defect-correction method. Then, ifu e CO) and Qa y) 
is a regular point, 


Tay ¥) = LPU = Undij 


= Ey? — EP yu ~ a), + LO — Ou 


l 32b ao 1/ dx Pu 
Š |-; (mig bal), de +053), 


e€ (dtu dtu 3 3 
ed) (betes 2 4 o(h?, 
12 (= ia =) ene a 


where the function is the solution of (103). A similar 
expression holds for interface nodes where (98) and (100) 
are valid, 


Again, using the boundedness of the inverse of LY. 
we obtain a pointwise estimate of the discretization error, 
C, =u Up 

The result in Theorem 18 remains of limited value 
unless a proper adjustment of the mesh to the behavior 
of the solution is made, since the derivatives in (104) are 
not bounded for e —> 0. The latter can be seen from an 
asymptotic expansion of the solution and its derivatives. 
Various forms of such expansions have appeared in the 
literature, A detailed survey in case of parabolic layers 
is found in Shih and Kellogg (1987) and in the case of 
exponential layers in Linss and Stynes (1999). 


As previously remarked, the errors depend on e, To obtain 
a more accurate local estimate of the discretization error 
than the one that the defect-correction term 3, provides, one 
can use the following method, which is based on Lemma 
7. From expression (101), it follows that we can compute 
differences from the current pointwise values of uf) to 
approximate the derivatives of u with the same (first) order 
as the approximate values uo, Now, the leading part of the 
truncation error LP (u - ut) takes the form 


W y) 
hy; a7u au ; F 
-= lila + lel if G;, y;) is 
2 o ay" Ji a regular point 
+o(h,;) ee 
= y 3u u 
= a + ag : z 
2 ax2 "ay if @;, y,) is 
3?u a skew-stencil 
+2lval ax iy) interface point 


By solving an additional system Le =e we can 
hence estimate the discretization error u — uf. Since e= 
u — u, = (u — uf) — ð, where 8, is computed in the 
defect-correction method, we have an accurate pointwise 
estimate of u — u, to be used as an error indicator in an 
adaptive local mesh-refinement procedure. 

The adaptive strategy can hence be to refine a cur- 
rent mesh Q,, by patching at every point or at the sur- 
rounding of points, where the approximation of Cn» ee i 
is larger than some tolerance rol. The tolerance can be 
an absolute small quantity or a relative quantity, tol = 
pmax; ; ec, yl O < u < 1. A slight disadvantage of 
the adaptive refinement method is that it leads to nonsym- 
metric algebraic systems. 

Numerical tests for singularly perturbed problems with 
the above criterion can be found in Axelsson and Nikolova 
(1998). 


6.5 The time-dependent convection—diffusion 
problem 


The time-dependent convection—diffusion equation can be 
classified as a parabolic equation with lower-order terms, 
In one dimension, it has the form 


u, = €u,, —vu,—cut+ f, xefa,b], t>O (105) 
with some proper boundary and initial conditions. 


We show below that the lower-order terms may affect 
stability. Consider (105) with v > 0 and c =0, f=0, 
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u(a,t) =a, u(b, t) = P, and u(x, 0) = ug, using the ex- 
plicit scheme 
uD u a), — Duh) + u), 
€ 
k h? 
u™ 4) 


+ patel set =0 (106) 


The scheme is only conditionally stable, namely, the rela- 
tion e(k/h?) < (1/2) must hold. 


Remark 10. Although the scheme (106) has accuracy 
O(k + h?), since k < (h? /2e), the scheme is second-order 
accurate. 


We set p = (k/h?) (referred to as the Courant number) 
and a = (vh/2e) (the cell Peclet number) and rewrite (106) 
as 


uF) = (1 — 2ep)u™ + en (1 + ado + en (1 — adv, 

(107) 
For parabolic problems, the following maximum principle 
holds. 


sup u(x, t)| < sup u(x, P| fort <7 
x x 


From (107), one can see that the numerical scheme will 
satisfy the maximum principle if and only if a < 1, that is, 
h < (2e/v). The latter is automatically satisfied for upwind 
schemes and acts as a restriction on the spatial mesh for 
central differences. 


6.6 A generalized difference scheme by use of 
local Green’s functions 


For some differential operators, the analytic form of their 
fundamental solution is known, It can be used to construct 
an exact or approximate local Green’s function with homo- 
geneous boundary values at discretization element edges 
(points). 

Let 


Lu=—V-(eVu)+bVu+cu=f, xeQ,u=0onIQ 

(108) 

The operator £* adjoint to the operator £ has the following 
form 


Lev V-(eVv)—-V:- (bv) +cv, xEQ2,u=00nIQ 


(Here £, b may be functions of x.) Let Q, be a difference 
mesh on Q, not necessarily uniform, and let g; be the local 


Figure 10. Subdomain «;. 
Green’s function to £* at x;, that is, 
L*g, =x- x) in o, g, = 0 on da, 


where œ; is typically an element as shown in Figure 10. 
Here, 8(x — x;) is the Kronecker distribution function at 
x;. If g; is not known analytically, we assume that a function 
g; is known, which is a sufficiently accurate approximation 
of g;. 
Owing to the homogeneous boundary values of g; on 
dw,, it holds that 


Lug, dew 


wi 


fZ do = 
w 


= | (eVu- Vg; — V - (bg,)u + cug) dw 
wr 


= 0g, 
= f Lgu do + $, ea” dy 


si L*g.uda 7 ett uiy ł 1 L" Gi — g)u dw 
Wr 80; on Wi 


=u) g eBiudy+ | 2G -gudo (109 
w OK 


i Wi 
For a one-dimensional problem, the local Green’s function 
at x, satisfies 


L* g(x) =0, x Ex) U i Xia) 


g(x) =0, x €[0,x,_,]U a1 H 
e(x,)[g;%;—) — gf(%j4+)] = 1 (110) 


The generalized difference scheme (109) with g, = g; then 
takes the form of a three-point difference method 


= 618; (jy py H Ui + Eigi ipg 
Xi+i 
=|  fgdr, i=l,2%...,n (111) 


Kim] 


where {x,}%2} is a division of the interval [0, 1]. 
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For the operator ~u,, + bu, = f, 0<x <1, u(0) = 
u(1) = 0, letting b = b(x) be constant when cvaluating g; 
and assuming for simplicity a uniform mesh, formula (11 1) 
takes the form 


1 1 Xi+1 
Ipet“ tu- J 4 e “iri = S £3; dx, 
i=1,2,...." (112) 
A computation shows that 
1 efi — ehd, yx <x <x, 
; = — , a 
$i bi (lp etb) Lek) 1, x cx < Xip 
(113) 
When b; > 0, 
peel lta Hu <4 <4, 
Ace oe i Se eas 


which reduces (111) to the central difference method 
except for the right side function. When b is constant and 
ip ‘+! fẹ; dx is evaluated exactly, the corresponding differ- 


ence method gives the exact solution at the nodepoints, 


Remark 11. The functions g; are the standard ‘hat’ 
functions used as basis functions in linear finite element 
methods. 


When solving (109), we can use a defect-correction 
method, that is, first compute an initial approximation u® 
from 


ag; 
w+ gp eiO dy = [ fZ% do, i=1,2,...,n 
aw On w; 


(1i4) 
and a correction (once or more times) using 


~ §uO(~,) +f pli 54 dy = f Lu ~ u)Z, deo, 
6a; on a; 


i=1,2,...,n ? (115) 


Then let u = u + 84 and repeat if necessary. 

Since (0%, /8n) < 0 at 4u,, it follows that the matrix A, 
in the arising linear systems in (114) and (115) of lowest 
order is an M-matrix and has a bounded inverse. 

In the above method, we have deleted the last term in 
(109), which contains the unknown exact Green’s function 
8;- If 8; is a sufficiently accurate approximation of &;, the 
Convergence of the defect-correction method is fast. 

One can determine an accurate approximation by taking 
the fundamental solution to eAw and multiplying it by the 
1D local Green’s functions to {assuming now for simplicity 
that c = 0) eu,, — (biu), = 8(x — x;) and Ely, ~ (bau), = 


3(x — x,), the solution of which is given in (113), both witl 
homogeneous boundary conditions. 

In practice, u is approximated by piecewise polynomials 
such as in finite element methods. Let its approximatio 
be u, and let the discretization error be split as u - 
up =N- 0p, where n=u—u,, Op =u, Uy, and uy 
is the interpolant in the space spanned by the piecewise 
polynomials. Applying this error estimate in (114) anc 
(115), and using the triangle inequality Ju — u,| < |n| + 
|9,| we readily find the pointwise error bound 


max |(u = u,)(x;)] < CHAR? II max |(w — u, X(x) 


In a 1D problem, the interpolation errors at x; are zero, sc 
the corresponding approximations in (113) using the exact 
local Green’s function takes the exact values u(x;) at the 
nodepoints, 


6.7 A meshless difference method 


On the basis of (109), a generalized difference scheme can 
be derived, which is even meshless, that is, it is applicable 
for an arbitrary distribution of nodepoints in Q and is not 
based on a difference mesh. 

Many important problems in practice, like crack prop- 
agation, fragmentation, and large deformations, are char- 
acterized by a continuous change in the geometry of the 
domain under analysis. Conventional finite difference or 
finite element methods for such problems can be cum- 
bersome and expensive, as they may require continuous 
remeshing of the domain to avoid breakdown of the calcu- 
lation due to excessive mesh distortions. Meshless methods 
provide an attractive alternative for the solution of such 
classes of problems. For a meshless method not based 
on (109) and for using nonpolynomial interpolation func- 
tions, see Duarte and Oden (1996) and the references stated 
therein. 

Let Lu = f in Q and u = g on 9Q be given, where Q is 
a bounded domain and £ is a second-order elliptic operator. 
Although the method can be applied for more general 
cases, where only some (sufficiently accurate) approximate 
Green’s function is known, we consider here the case where 
the Green’s function for the adjoint operator on a disc is 
known. Further, even though the construction is applicable 
for more general operators, for simplicity, we consider here 
only the operator L = ~—(87/8x?) — (82/dy?). 

Let {x,,.-.,%,} be a set of disjoint points in @ and 
let V = {x,,...,x,,} be the subset of interior points. For 
each point x, in V, we choose a set of g;(<n) neigh- 
boring points P(x;) = {x{”,..., x} such that all angles 
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£(x;, xf) are different. The aim is to derive a local differ- 


ence approximation at x; in the form 
Go 3 
T) — J yP raO) =f fg i=1,...,m (116) 
k=l ae 


where u(x;) denotes the corresponding approximation to 
u(x,) and g; is the local Green’s function, which satisfies 
Lgi = èx e) in w,, (x; -) is the Dirac measure at x; and 
the trace tr(g,) of g, is equal to zero on de,. Here, ow; is a 
disc with center at x;, the radius of which will be determined 
later. It then holds that g; = (1/22) Ing;/r),0<r <r 
It is assumed that fi fg; is bounded, otherwise the cor- 
responding singularity of u must be subtracted from the 
solution, 

It is further assumed that the union (J; œ; of the discs 
cover Q. 

The basic relation to be used is 


m an” 
dg, 
zs L* Ut u 
Oj a dw; dn 
or 
fei = ula) +f agi, (117) 
is & = i Bex, én 


Here, u will be approximated by polynomials of degree 
Pi» P; 2 2. For computational reasons, hereby it is efficient 
to use the first harmonic polynomials of L, but written as 
trigonometric functions in polar coordinates, complemented 
with the missing polynomials, which are not harmonic. In 
polar coordinates, 


u 1ldu 1 du 


Ore Or Aa 
and r* cos k0, r* sinko, k = 1,..., P; are the first harmonic 
polynomials. 

We note that to model corner singularities, one can use 
the fractional power form of the latter functions, that is, 
r° singað, where a = n/c for a corner with interior angle 
w, if w Æ n/n, and r° lnr sina if œ = UVALE E A 

We now approximate u in (117) locally around x, by a 
linear combination of those functions (i.e. we use a trigono- 
metric expansion plus possibly some terms corresponding 
to the missing polynomials up to degree p;). 

Substituting the polynomials in (117) results in a linear 
system for the corresponding coefficients. It is not neces- 
Sary to solve this system but the arising matrix can be used 
for the computation of the coefficients y®, k=1,...,q in 
(116). 


For certain regular distributions of nodepoints, it turns 
out that there is a cancellation of some error terms implying 
that q; can be much smaller than the total number of 
polynomials (monomials) used, for example, in the Pascal 
triangle, there appear P;(p; + 1)/2 such monomials. 

The radius r; in co; is determined to make the difference 
approximation in (116) exact for the missing second-order 
polynomial u = x? + y?, Then, Cu = —4 and (116) take 


the form 
Pi es 
SPH =4 f ydo 
kel i 
where rọ = x — x,|. From g; = (1/21) In(r;/r), it fol- 


lows that 
l2 
g do = -ri 
$ i 4 i 
so 


Pi 1/2 
n= B sr} 
k=l 


that is, since DP, yf? = 1 (take u = 1 in 116), r; form a 
weighted average of distances between the local neighbors. 


Discretization error 
The exact solution satisfies (117), while the difference 
approximation is 


Pi A i 
T) — J PEEP) = R (fg) (118) 
k=] 


Here, R;(fg;) denotes the quadrature approximation used 


for Jo, fg; with error Jo fa; — R,(fg,). 
Writing equation (117) in the form 


uo) — yu?) = f =e, 
ae Ali da ðN 


Pi ; 
— Dove we?) + f fai 


k=l 
and subtracting equation (118), we obtain 
Pi i 
er) — Yo ye, aP) = 8, u) + 8,(u) 
kel 
where e, = u — Ñ and 


à. (u) =$ SAUS Sa Sy uc.) 
1 Gs on ~ k 


3u) = f f 8 — Rife) 


| 
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that is, denote the line integral and the domain integral 
errors, respectively, It is readily seen that the line integral 
error can be written as a line integral of the interpolation 
error in u. 


Lemma 8. Letu 1, denote the interpolation of u on es 
0 N Pi- Then, 


ZA 


If yË >0, then the matrix A, in (116) is an M- 
matrix. It can be seen that the smallest eigenvalue of 
the matrix is bounded below by min; O(r?) =h? and 
likewise HAzty œ = Oho 2), For the lowest (second)-order 
schemes, it holds that y? > 0. However, the latter does 
not hold for higher-order schemes, and one can expect that 
some of the coefficients are negative in general. To solve 
the resulting linear system with a matrix A,, it can be 
efficient to use a defect-correction method with a second- 
order (or possibly only first order) scheme, for which 
the corresponding values of y® > 0, as corrector for the 
higher-order scheme. 


Theorem 19. Assume that the difference scheme corre- 
sponds to exact interpolation of all polynomials of degree P; 
at least, and that the solution is sufficiently smooth, Assume 
also that ch < r; < Ch for some constants c, ©. Then, using 
Some steps of a defect-correction method, if necessary, the 
pointwise discretization error satisfies 


le, (%,)| < O(n?) 


Where p = min; p;. 


Proof. By Lemma 8, the local discretization (interpolation 
and integration) errors are O(r?'*!) < O(hPt!), The local 
errors are globally coupled by the assembled matrix A,, 
whose inverse is bounded by O(h~). Hence, |le, lloo < 
WA loo OCA? *4) = O(hP-}), = 


Example 4 [A regular distribution of points} We con- 
sider now the case where the nodepoints are chosen as the 
vertices in a hexagonal mesh, that is, we use a seven-point 
interpolation scheme. Owing to symmetries of the scheme, 
it follows readily that 8; (u) =0 for all polynomials of 
degree five in variables x, y. Hence, the scheme is fourth- 
order accurate if ipa fg; is computed with corresponding 
accuracy. 

Similar results hold for the vertices in a cuboctahedral 
mesh, 


6.8 A hybrid method of characteristics and 
central difference method 


As remarked previously, in many diffusion processes aris- 
ing in physical problems, convection dominates diffusion, 
and it is natural to seek numerical methods that reflect their 
almost hyperbolic nature. 

The convective character is easily seen by considering 
the method of characteristics for the reduced equation, 


V: Vu +cu=finQ, u=gon r. ={xeT,v:n <0} 
Let z(t, s) be the parametric representation of the lines of 
characteristics defined by the vector field through the point 


(xo, Yo) on I, that is, we have x =2:(,5), y = z(t, 5) 
for points on this line and 


dz(t, 

FED x viz,y) =valt,9), 1 >0, 
z(0, s) = (Xp, WET 

Since the vector field is uniquely defined, no two charac- 


teristic lines may cross each other. Using the chain rule, we 
obtain 


where ii(t) = u(z(t, s)} and s is fixed. Hence, 


dû(t) 
dt 


+cû = f(z(t,s)), t>0, û(0) = u(x, Yo) 


so when the characteristic lines have been computed, the 
solution of the reduced equation along each characteristic 
line can be computed as the solution of an initial value 
problem for an ordinary differential equation. 

When £ is small and |v| > 0, the solution of (108) is 
close to the solution of the reduced equation except in 
subdomains where boundary layers occur such as when 
the solution of the reduced equation does not satisfy the 
boundary conditions on the outflow boundary. 

We describe here a combination of a method of char- 
acteristics and a central difference method. The method 
is illustrated on a 1D problem. For a treatment of 2D or 
higher-dimensional problems, see Axelsson and Marinova 
(2002). 

The convection—diffusion problem then has the follow- 
ing form: 


Lu = eu" +w teus f, O<x <1, 
u(0)=0, u(l)=1 (119) 
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We assume that v > vọ > 0 and c > O in [0, 1] and that v 
and c are bounded C! functions, v, c € C!{0, 1]. 


The difference scheme 

Let 0, 0 <8 <1, be a variable weight coefficient. The 
difference method on an arbitrary mesh Qy = {x,,i = 
0, 1,..., N, xo =0,xy = 1, X; <x;,,} with variable step 
h; =X, ~ Xini i =1,...,N (N is the number of the mesh 
intervals) takes the form 


uN 
L'u! = 2e ( aTa uo ei 16, 
OB hi N iss i 


ud 
y [rao aiis M= 


h; ia 


La 


=f" =6,fq) +0 f(x -#) 


Leau? wo 


ge | 


for each interior mesh point x;, i = 1,2,..., N — 1. Here, 

e uj’ denotes the finite difference approximation of the 
solution u at mesh point x;, i = 1,2,...,N — 1; 

e $6, =[1/(1+,7,)], where r; = [v(x,)h/2e] is the local 
Peclet number, k = max{h,, h;,,}; 

e 1-8, ={r,/0+7,)] = [v(,)h/2¢ + v(x, )A]- 


The corresponding finite difference mesh is illustrated in 
Figure 11. 

The scheme is a linear combination of a central difference 
scheme at x, and at x, — (h;/2), except that the central 
difference for the second-order derivative is evaluated only 
at x;. The scheme is a three-point upwind scheme. When 
£ X h, it is dominated by the approximation at x, — (h;/2), 
while when £ > A, it is dominated by the central difference 
approximation at x,. 

We assume that the mesh is uniform or varies smoothly 
when € > k. 

It is readily seen that the operator LY is monotone if 
h; < 2v9/ max, g c(x). 

With a barrier function w = x|o y» ê Straightforward com- 
putation shows that 


@=, 1, aD (120) 


. 
Xt Digs 


Figure 11, Finite difference mesh in a 1D case. 


so by the Barrier Lemma, 


Ia® <À wl = — (121) 
Vo Vo 


which holds uniformly in e. 


Truncation error estimate 

To estimate the discretization error ||u — u™ ||, we will first 
prove the boundedness of the truncation error L¥[u(x,) — 
uN] = L¥u(x;) — f". Assuming first h = h; = h,_, and 
using a Taylor expansion and by a straightforward deriva- 
tion, rearranging terms (see Axelsson and Nikolova, 1998), 
we obtain the following result for the truncation error, 


Lul) — fN 


= —e(l — ofwa — u(x — )| 


1 f 1 
— pew + gav )h u" 


1 h 1 h 
a EEr a o — i a 2 tape a a 
( zl i a u” + TE 5} u | 


By the choice of 9,, the error arising from the term 


h 
= = 0! Mee a agltl yas — 
e( oofu Q) u ( i 3) 


Er; n h ev(x;)h h 
u i a bis Na i in 
eal i li (z 5) 2e+ vw yh 2” 


< Fmin{(v(x,)h, 2e}hlu’”| < 4 max v(x)h? |u| 


is 


a 
$ =e 


In the above equation, the derivatives u”, u”, u” are 
evaluated at some intermediate points. With an integral 
form of the remainder, we obtain the following estimates 
for the truncation error: 


" uE +C, 


Ai- 


ni” ju” (E) dE + Ch i ju” Œ) d£ (122) 


[LX u(x,) — fi" < Ceh 


where C}, C,, C, are positive constants. 
If the mesh is irregular and h; Æ h;4;, then for the 
estimate of the truncation error at x,, we have 


[Lr u) fl < ry ju” (§)| d5 


Xi- 


+ Ch | WOGA ka u'&ldg (123) 


Xii 


where h; = max{h;, hi1}. 


$ 
a 
f 
3 


Discretization error estimate 

The estimate of the discretization error includes the follow- 
ing steps. First, an estimate of the truncation error, second, a 
construction of a suitable barrier function w™, and, finally, 
an estimate of the discretization error based on a discrete 
comparison principle. 

To resolve the solution in the layer, one can use a mesh 
refinement — a graded or a uniform mesh in the layer 
part. Consider the convection-dominated case e << N -1 Set 
qt = min{(1/2), (2/B)eln N}, where 0 < B < vo, and denote 


l-r N 
N =li = =< 
Q ={i N E o3 
2 
N 
ay = Me Pa wh 
Then Q = QNUQN. The mesh points x, in Q7 are 
defined by 


e x,;=1-t+tG—N/2)/(N/2) for the piecewise 
uniform mesh; 

e x, =1—t4+tinG +1—N/2)/In(N/2+ 1) for the 
logarithmically graded mesh. 


We use the notation 92; for the uniform mesh, called 
Shishkin mesh (Shishkin, 1990) and 92, for the logarithmi- 
cally graded mesh, which is of Bakhvalov type (Bakhvalov, 
1969). 

We consider here a corresponding mesh-refinement met- 
hod. To analyze the method, we use the following splitting 
of the solution: 


u=g+z 
The smooth part g satisfies the problem 
Le=f, g0 =u) =0, v()g'(l) +cat) = f) 
It is assumed that the given data is such that 
Ig@|<C for0<k<4 
The layer part z satisfies 7 
£z=0, z(0)=0, 2(1) = u(1)— (1) 


and 
ie <ceterp(- EE) frocks 0% 


By considering each term of the right-hand side of the 
following inequality separately, 


lu ~u” < lg — gt Iz= 27 
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where g%,z% are the corresponding finite difference app- 
roximations to g and z respectively, the following dis- 
cretization error estimate can be proven. 


Theorem 20. Letu be the solution of the convection—dif- 
fusion problem (119) and uY be its discrete solution ob- 
tained by applying the second-order hybrid method on 
the Shishkin mesh Qg. Let v,c, f € C*{0, 1] and v > wy > 
B > 0. Then, the discretization error is globally uniformly 
bounded in £ by 


ju — u” || < CN“ dn (125) 
where N is the total number of points used. 


Remark 12. In a similar way, one can show for the 
exponentially graded mesh a uniform optimal order estimate 
llu — u” || < CN™ (see Axelsson and Nikolova, 1998 for 
details of the technique to be used). As shown in Axelsson 
and Marinova (2002), the above results are nicely illustrated 
from numerical tests. 


Remark 13. Instead of the present choice of 0, @ = 
2e/(2e + {v|h), one can let @ = min{1, 28/(lvi})}. Depend- 
ing on the relative signs of the discretization error terms, 
one of the two choices may give slightly smaller errors than 
the other. 


Remark 14. For a combined method of characteristics 
and a Galerkin finite element method applied for time- 
dependent problems, see Douglas and Russell (1982). 


Final remarks: Difference methods are most useful for 
regular or quasi-regular meshes where the mesh size varies 
smoothly, such as hi4; = [1+ O(h; )lh;;. Otherwise, a 
defect-correction method can be applied. 

Alternative methods for irregular meshes can be based 
on finite element or finite volume methods. It can often 
be efficient to write a second-order problem as a system 
of equations of first order, for instance, when the coeffi- 
cient function a varies strongly locally. For instance, the 
diffusion problem —V (aVu) + cu = f can be written in 
the form 


ee) 
a 
V-2-cu=—f 


(126) 


and discretized by finite element methods that satisfy a 
certain inf—sup stability condition; see, Brezzi and Fortin 
(1991). 

As shown in Axelsson and Gustafsson (1991), if one 
uses a piecewise constant approximation for z,,2, and 
piecewise linear for u in (126), then using the simplest 
numerical quadrature for ji (1/a), the method reduces to 
the standard finite element method, or equivalently, the 
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five-point difference method. For variable coefficient a 
and higher-order quadrature methods, the integrals result 

* in harmonic averages of a over each element, which can 
increase the accuracy of approximation. 

In Chou and Li (1999), it has been shown for a covolume 
finite difference method for irregular meshes on convex 
domains that if the solution is in H>(&), one can derive 
max-norm estimates O(h? In h7') for the error in the solu- 
tion and O (h) for the gradient (see also Li, Chen and Wu, 
2000). 

Earlier presentations of generalized finite difference 
methods can be found in Tikhonov and Samarskii (1962) 
and Heinrich (1987). 


Table 1. Basic finite difference approximations. 


As shown in Kanschat and Rannacher (2002), asymptotic 
error expansions such as those used in Richardson extrap- 
olation also hold for a uniform subdivision of a general 
coarse triangulation of a convex polygonal domain. The 
estimates for the remainder term are based on estimates 
for the discrete Green’s function, similar to those found in 
Schatz and Wahlbin (1978). 


7 A SUMMARY OF DIFFERENCE 
SCHEMES 


For the convenience of the reader, we list below a summary 
of various difference approximations (Tables 1-4). For 


Derivative Scheme Accuracy 
1 
ui Forward gem — u) O(h) 
1 
uy Backward zu — ui) O(h) 
bi 
biu; Upwind if b; > 0, < — üh 
3 bi 
if b; < 0, gem — ui) O(h) 
ul Central difference 2 (tizi — Hi1) olt) 
1 2 
u; Forward zp" + 4lipi ~ Uiga) O(h?) 
1 
ul Backward gg ue — Mii +2) O(h?) 
1 
u} Central difference pent — 2u; + uj-1) oth’) 
uy Forward Ou — Sujy1 + 4uig2 — Uiga) O(h?) 
1 
ul Backward puss + 4uj-2 — Suj—1 + 2u;) O(h?) 
B ; 1 2 
Uxx + Uyy Five-point cross gtis + liyi j — Alij + Uig j + Ui, ja) O(h*) 
Uys + Uyy Five-point skewed gy -u + lipi ja — li j + Uii jh O(k?) 
+ Wiss j+) 
il 
Uxx + Uyy Nine-point ga r j + 4u y — 20u, j + Suis; O(h’) 
+ 4i jal F Uii j1 F Milja 
F Uii jyl H Hipi j) 
a y 1 P 
Aux. + buyy Nine-point anisotropy gpa @”i-L j + Buss, + Yui j H Yui jt Boe if 
— 10(a + b) uj, j + ji, j—1 + Auiti j- -a <b < 5a 
+ AUi j+ F OMi, jH 5 _at b 
= 
p = 5a- b, 
y=5b-a 
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Table 2. Finite difference approximations for parabolic equations. 


Mode} parabolic problem: u, = bux, + f 


Scheme Accuracy/stability 


Forward-time, central-space 
k+l yk k ko yt 
Um mY, ~a to 


vÉ a d 5 be 1 
z m = bht a mol 4 fe Accuracy (k, h°); stable if ret) 
Backward-time, central-space 
ptt! i ut okt E 2yk+! + vt 
T Saal = bae + fit Accuracy (k, h°), unconditionally stable 
Crank~Nicolson 
ukt — yk 1, [okt — 20t! 4 yet 
n r "= 3° z a m= Accuracy (k?, h?); unconditionally stable (the 
§-method, 6 = 1/2) 
ve - uk + vka 1 k+l k 
a E ai +5 (fa + fh) 
Leap-frog 
k+l _ yk-1 k k yk 
vr =v Unti T Un F Um- 
m = m pnt r mE y fé Unconditionally unstable 
Du Fort-Frankel-Saul’ev 
k+l _ yk- k k+l oy pk-1 k 
vktl yv vi — (rE + ve) + uk 
ge = e ina cia Sfp Accuracy ty, = O(h?) + ORY) + ORK), 


explicit but unconditionally stable, consistent 
with a wrong problem, such as u, ~ Uys + Our 
= 0 if k =ah and a > 0 is fixed. 


Note: Accuracy (k?, h4) means that the local truncation error of the scheme is tz, = O(k?) + O(h%). 


Table 3. Finite difference approximations for first-order hyperbolic equations. 


Model hyperbolic problem: u; + au, = f 


Scheme Accuracy/stability 

Leap-frog 

pokl pe y~! wig — v" A k 
e a = fa Conditionally stable ( ar) < 1) 
Lax—Friedrichs 


veti — 1/2(y, +t) vi, ~ ue 
m mhl m-i m+l mol __ 
+a fE 


k 2h 


Consistent for k7'h? —> 0; CFL must 
hold (explicit scheme) a; | <1 

Lax—Wendroff 

ut Ue fer Uml T Umar _ OK Ving — 20 F Vn 


A 2h ak 2 h? 
=3 (at + #8) — th (faa ~ Sai) 
Crank-Nicolson 
viel moe Un Ut a Oe fat tie 


ee ah 2 


Accuracy (k?, h?); CFL <1 


k 
a~ 
h 


Accuracy (k?, h?); implicit scheme 
(uncond. stable) 

Forward—backward MacCormack 

k 


Oath = ys — a (ay Oe Accuracy (k?, h?); identical to 
arad 3 Aan) k i i Lax-Wendroff when f is constant 
Va = 3 |m tn — ara O -aten ] 


Note: Accuracy (k?, h?) means that the local truncation error of the scheme is tga = O(kP) + O(h1). 


! 
t 
i 
| 
l 
| 
| 
| 
I 
| 
! 
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Table 4, Finite difference approximations for second-order hyperbolic equations. 


Model hyperbolic problem: un = au, + f 


Scheme 


Accuracy/stability 


The y-method, (2, 2) accurate 
(I — yek? Ay) (ut! ~ Qu" + v2) 
= Ck? Avy +k Ly fat! + (1 2y) fa + yfe] 


1st (2,4) accurate scheme (for f = 0) 


a> D —1 2 
iia of 8 ee 
k? 207) ™ 

12h? 


2nd (2, 4) accurate scheme (for f = 0) 


h? 
A?un = a? ( + a) A2vu", or 


12 x m 
n+l n+l n+l 
Uapi T LOugtt + unt) — 2%.) + 10u% + outa) 


Unconditionally stable for y > 1/4 
Conditionally stable for 0 < y < 1/4 


IS 


Conditionally stable: a 


ie 


bs 


Conditionally stable: at < ta 


2 
-1 z = k 
+ uth + lOve! 4 vt = 120? (vt; —2u%, + ota) 


Note: Accuracy (p, q) means that the local truncation error of the scheme is tz a = O (k?) + O(h). 


Symbolic notation: A? = [(v",, — 2u% + u"\_,)/A"1. 


a derivation and analysis of these schemes, see Richtmeyer 
and Morton (1967) and Strikwerda (1989). 
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1 INTRODUCTION 


The aim of this chapter is to discuss interpolation oper- 
ators that associate with a function u, an element from 
an h-version finite element space. We investigate nodal 
interpolation and several variants of quasi-interpolation and 
estimate the interpolation error. The discussion becomes 
diverse because we include (to a different extent) triangu- 
lar/tetrahedral and quadrilateral/hexahedral meshes, affine 
and nonaffine elements, isotropic and anisotropic elements, 
and Lagrangian and other elements. 

Interpolation error estimates are used for a priori and 
a posteriori estimation of the discretization error of a 
finite element method. This is explained in other chapters 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


of the Encyclopedia (see Chapter 4, this Volume for 
finite element methods in the displacement formulation or 
Chapter 9, this Volume for mixed finite element methods). 
To estimate the error a priori, one can often use the nodal 
interpolant. To get optimal error bounds, one has to use 
the maximum available regularity of the solution. Since the 
regularity can be described differently, one is interested in 
local interpolation error estimates with various norms on the 
right-hand side, including norms not only in the classical 
Sobolev spaces but also in weighted Sobolev spaces or in 
Sobolev—Slobodetskii spaces. For the ease of presentation, 
this chapter is restricted to the case of Sobolev spaces. 

The situation is different for a posteriori error estimates. 
In order to investigate residual-type error estimators, local 
interpolation error estimates for functions from the Sobolev 
space W1!?(Q) are needed. Such estimates can be obtained 
for many finite elements only for quasi-interpolation oper- 
ators where point values of functions or derivatives are 
replaced in the definition of the interpolation operators by 
certain averaged values. 

In Section 2, we introduce finite elements and meshes. 
Section 3 is devoted to the definition of the interpola- 
tion operators. After a short discussion of the classical 
Deny—Lions lemma in Section 4, we derive error estimates 
for the nodal interpolation operator in Section 5, both for 
isotropic and anisotropic elements. We develop the theory 
in detail for affine elements and discuss shortly the non- 
affine case. Quasi-interpolants are investigated in Section 6 
for isotropic Lagrangian elements, whereas anisotropic ele- 
ments are mentioned only in brief. An example for a global 
interpolation error estimate is presented in Section 7. A typ- 
ical solution with corner singularities is interpolated on a 
family of graded meshes, which is chosen such that the opti- 
mal order of convergence is obtained despite the irregular 
terms. 
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We refer to the literature at the appropriate places in this 
overview chapter and omit references to related work in 
the introduction. Moreover, we underline that the chapter 
is written in the spirit of the h-version of the finite element 
method. We do not investigate the dependence of constants 
on the polynomial degree of the functions. For interpolation 
error estimates in the context of the p- and Ap-versions of 
the finite element method, we refer, for example, to Schwab 
(1998) for an overview and to Melenk (2003) for more 
recent work on quasi-interpolation. 

Let d = 2, 3 be the space dimension and x = (x),...,X,) 
a Cartesian coordinate system. We use standard multi-index 
notation with á := (a,...., az), where the entries a; are 
from the set Ny of nonnegative integers, and 


d d d 
x =| [ets a! = [ [ot lal =) o; 
i=1 i=l 


i=l 


au gt 


The notation W£? (G), £ € No, p € [1, oo], is used for the 
classical Sobolev spaces with the norm and seminorm 


Ilyo = 2 Í |D%v)? 


lase 


bihu = Df B 


Jol=é 


for p< oo and the usual modification for p= oo. In 
general, we will write L? (G) for W°?(G). The symbol C 
is used for a generic positive constant, which may be of a 
different value at each occurrence. C is always independent 
of the size of finite elements, but it may depend on the 
polynomial degree. 


2 FINITE ELEMENTS 


In this section, we introduce in brief the notion of the 
finite element. While Chapter 4, this Volume presents this 
topic comprehensively, we focus on those pieces that are 
necessary for the remaining part of the current chapter. 
Ciarlet (1978) introduces the finite element as the triple 
(K, Px, Ng), where K is a closed bounded subset of Ri 
(some authors use open sets) with nonempty interior and 
piecewise smooth boundary, Px is an n-dimensional linear 
space of functions defined on K, and Nx = {Nix Her is 
a basis of the dual space P. The smooth parts of the 
boundary are called faces; they meet in edges and vertices. 
In two dimensions, the edges play the role of the faces. 
The functions in Pg and the functionals in Ny are 
sometimes called shape functions and nodal variables 


respectively. We adopt these names here, although Pg does 
not necessarily define the shape of the element K (this 
is true only for isoparametric elements, see below) and 
also N € Ng is not necessarily a function evaluation in 
nodes. Important examples of finite elements are discussed 
in Chapter 4, this Volume (Examples 4—9 and 16-17). 
The nodal variables define a basis {¢, x }7..) Of Pg via 


Nierle) =8,j BF =n (1) 


which is called the nodal basis. This basis is employed 
here since it allows an elegant definition of interpolation 
operators. Note, however, that other bases might be advan- 
tageous for a hierarchical approach to approximation (see 
e.g. Chapter 7, this Volume) or for an efficient implemen- 
tation (see e.g. Chapter 5, this Volume). 


Example 1. An important example is the element 
(K, Px, Ng) with a triangular or tetrahedral set K, with 
Px =P, being the space of polynomials of degree one, 
and with the set Mg consisting of function evaluations at 
the vertices of K. The nodal basis can be represented here 
by the barycentric coordinates ^; g of the element, o; g = 
hjg J =1,...,d + 1. Recall that any point x € K defines 
(together with the vertices of K) a splitting of the element 
into d +1 triangles/tetrahedra K jœ). The ratios hj g := 
|K;(x)|/|K| are called barycentric coordinates of x. 


Finite element spaces over a domain Q C Rf are defined 
on the basis of a finite element mesh or triangulation T 
of Q. This is a subdivision of @ into a finite number 
of closed bounded subsets K with nonempty interior and 
piecewise smooth boundary in such a way that the following 
properties are satisfied: 


1. @=Uxerk- 
. Distinct K}, K, € Thave no common interior points. 
3. Any face of an element K, € Tis either a subset of the 
boundary ðX or a face of another element K, € T: 


Let any element K be associated with a finite element 
(K, Px, Ng). We define the corresponding finite element 
space by 


veELl*(Q): vg:=vly €Pxy VK ET and 
FE; := Vg,» Vg, Share the same nodal 
values on K, N K, 


(2) 


(compare Chapter 4, this Volume). The dimension of FE; 
is denoted by N}. 


Remark 1. There are other approaches to the construction 
of finite element discretizations, for example, the weighted 
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extended B-spline approximation where the geometry is not 
described by the mesh but by weight functions (see Héllig 
(2003) for an overview). Since this method does not fall 
naturally into the frame we are developing here, we will 
not discuss this method in detail. 


The weak solution of elliptic boundary value problems 
of order 2m is generally searched in a subspace V of 
the Sobolev space W"?(Q). The space V is defined by 
imposing appropriate (for simplicity, homogeneous) bound- 
ary conditions. By imposing these boundary conditions 
also in the finite element space FE,, we obtain the N- 
dimensional finite element space Vy. A conforming finite 
element method requires Vy C V, a condition that is satis- 
fied only when the nodal values imply certain smoothness of 
the finite element functions, in particular, FE, C C"—1(Q) 
(see also Chapter 4, this Volume). On the other hand, 
solutions to electromagnetic field problems are not neces- 
sarily in W17(Q) such that the approximating spaces also 
need not be continuous (see the survey of Hiptmair, 2002). 
Continuous finite element spaces might even lead to wrong 
solutions in this case. 

For later use, we also define in FE, and Vz the global 
sets of nodal variables (functionals) Nz} = {N,}; 2 and 
Ny= {N;}#_, respectively, and the corresponding global 
basis {oj} C FE, that satisfies 

Nb) =j BF = Lyce soy, 
The set Nr + is the union of all Nx, K € T, where common 
nodal variables of adjacent elements are counted only once. 
In Nr C Ng,» some nodal variables (degrees of freedom) 
might be suppressed because of the boundary conditions. 
Note further that Vy is spanned by {bj}. 

In this chapter, we will concentrate on triangulations 
consisting of simplices (triangles or tetrahedra) or con- 
vex quadrilaterals/hexahedra. A typical parameter for the 
description of the elements K is the aspect ratio Yg, the 
ratio of the diameter hy of K, and the diameter @g of 
the largest ball inscribed in K. We will call elements with 
a moderate aspect ratio isotropic and elements with large 
aspect ratio anisotropic. For isotropic elements, we allow 
the quantity yg to be included in constants in error esti- 
mates, whereas for anisotropic elements, the aspect ratio 
must be separated from constants, which means constants 
must be uniformly bounded in the aspect ratio. 


Example 2 (Isotropic and anisotropic simplices) Trian- 
gles and tetrahedra with plane faces (edges) are sometimes 
called shape-regular if they are isotropic. Shape-regularity 
is generally used as a property that is easy to achieve in 
mesh generation and that allows for a numerical analysis 
(e.g. interpolation error estimation and also the proof of 
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a discrete inf-sup condition in mixed methods or efficient 
multilevel solvers) at moderate technical expense. 

Zlámal (1968) has shown for triangles with straight edges 
that a lower bound on the interior angles is equivalent to an 
upper bound on the aspect ratio. Therefore, shape-regularity 
can be defined equivalently via a minimal angle condition: 
There exists a constant y,,,, > 0 such that the angles of 
all triangles of a family of triangulations are bounded from 
below by Yuin: 

Elements with large aspect ratio can be used advanta- 
geously for the approximation of anisotropic features in 
functions (solutions of boundary value problems), for exam- 
ple, boundary layers or singularities in the neighborhood of 
concave edges of the domain. For the numerical analysis, 
it is often necessary to impose a maximal angle condition: 
There exists a constant Ymax > 0 such that the angles of 
all triangles of a family of triangulations are bounded from 
above by Ymar: An analogous definition can be given for 
tetrahedra (see Apel, 1999a, pages 54, 90f). Figure 1 shows 
an isotropic triangle and two anisotropic triangles, one that 
satisfies the maximal angle condition and one that does not. 
Note that if the angles are bounded from below away from 
zero, they are also bounded from above away from zero, 
whereas the converse is not true. Therefore, estimates are 
usually easier to obtain for shape-regular elements (where 
Cot Ymin enters the constant) than for anisotropic elements 
(where, if necessary at all, cot Ymax enters the constants). 

Most monographs consider only shape-regular elements, 
for example, Braess (1997), Brenner and Scott (1994), 
Ciarlet (1978, 1991), Hughes (1987), and Oswald (1994). 
Anisotropic elements are investigated mainly in research 
papers and in the book by the author Apel (1999a). The 
maximal angle condition was introduced first by Synge 
(1957), and later rediscovered by Gregory (1975), Babuska 
and Aziz (1976), and Jamet (1976). 


Example 3 (Shape-regular quadrilaterals) From the 
theoretical point of view, it is important to distinguish 
between parallelograms and more general quadrilaterals. 
Parallelograms share the following property with trian- 
gles: for two elements K, and K,, there is an invert- 
ible affine mapping F:% € R? > x = F(z) = AR +a € 
R? with K, = F(K,). This property simplifies proofs, and 
results for triangles can usually be extended to parallelo- 


A= 


Figure 1. Isotropic and anisotropic triangles. 
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ee 


Figure 2. Degenerated isotropic quadrilaterals. 


grams. Shape-regularity is defined by a bounded aspect ratio 
Yg. Parallelograms are sometimes even easier to handle 
than triangles since the edges point into two directions only. 
Similar statements can be made in the three-dimensional 
case for parallelepipeds. 

The situation changes for more general elements. A 
bounded aspect ratio is necessary but not sufficient for 
shape-regular quadrilaterals. For several estimates, it is 
advantageous to exclude quadrilaterals that degenerate 
to triangles (see Figure 2). The literature is not unani- 
mous about an appropriate description. Ciarlet and Raviart 
(1972a,b) demand a uniformly bounded ratio of the lengths 
of the longest and the shortest edge of the quadrilateral and 
that the interior angles are away from zero and 1. Girault 
and Raviart (1986) assume equivalently that the four trian- 
gles that can be formed from the vertices of the quadrilateral 
are shape-regular in the sense of Example 2. Another equiv- 
alent (more technical) version is given by Arunakirinathar 
and Reddy (1995). 

Weaker mesh conditions were derived by Jamet (1977) 
and Acosta and Durán (2000). Jamet proves that the ele- 
ments shown in Figure 2 can be admitted, but he still relies 
on a bounded aspect ratio. Acosta and Durán formulate 
the regular decomposition property, which is the weakest 
known condition that allows to prove the standard interpo- 
lation error estimate for Q, elements. For a detailed review, 
we refer to Ming and Shi (2002a,b). 

Further classes of meshes can be described as being 
asymptotical parallelograms (see also the papers by Ming 
and Shi). Some results that are valid for parallelograms 
can be extended to such meshes but not to general quadri- 
lateral meshes, for example, superconvergence results and 
interpolation error estimates for certain serendipity elements 
(compare Remark 7). Meshes of one of these classes arise 
typically from a hierarchical refinement of a coarse initial 
mesh. 

A more detailed discussion of all these conditions is 
beyond the frame of this chapter. We will restrict further 
discussion to affine elements and to elements that are shape- 
regular in the sense of Ciarlet/Raviart or Girault/Raviart. 


We will develop the interpolation theory on the basis of 
the following assumption. 


Assumption 1. The finite element space FE, is con- 
structed on the basis of a reference element (K, Pg, Ng) 
by the following rule: 


1. For each K e T7, there is a bijective mapping F; Kite 
R? > x = Fg (å) € R’ with K = Fy(K), 


2. u€ Py iff û:=u ° Fg € Pg, and 


3. Ng) =N gU ° Fy), i=1,....m, for all u for 
which the functionals are well defined. 


If possible, an affine mapping is chosen, 
F(t) = Ata with Ac R’*4, a eR? 


otherwise the isoparametric mapping is used, 


m 
Fê) = oa’ yg) 
i=l 

where the shape of K is determined by the positions of 
nodes a! € R? and shape functions 4; g with , g(a) = 
8, j- A typical example (in particular for Lagrangian ele- 
ments, these are elements with N, (u) = u(a’) where a! are 
the nodes of K) is to choose span {W; glim = Pz but one 
can also choose a lower-dimensional space. Then the map- 
ping is called subparametric. An example is to use Q; 
instead of Q, in the case of quadrilateral or hexahedral 
elements. 

These examples for the mapping show that Assumption 
1 is not too restrictive. Affine and isoparametric Lagrangian 
elements are covered. Note, in particular, that the space Px 
for nonsimplicial elements (quadrilaterals, hexahedra, S 
or for isoparametric elements is generally defined via (2). 
For non-Lagrangian elements, condition (3) is the critical 
one. If Mg contains the evaluation of the gradient in a 
vertex of K, the nodal variable should be written in the 
form of scaled derivatives in the directions of the edges that 
meet in this vertex; see also the discussion in Brenner and 
Scott (1994), Section 3.4). Analogously, functionals in the 
form of integrals should be properly scaled. Such technical 
manipulation, however, is not possible in the case of normal 
derivatives; they are transformed into oblique derivatives. 
These elements are excluded by our framework but they can 
sornetimes be treated as a perturbation of another element 
that is conforming with the theory we are going to present; 
see, for example, the estimates for the Argyris element in 
Ciarlet (1978, pages 337ff). 


3 DEFINITION OF INTERPOLATION 
OPERATORS 

3.1 Nodal interpolation 

Given a finite element (K, Pg, Mg) with a nodal basis 


{b; «fz, and the nodal variables {N, x }7.1, it is straight- 
forward to introduce the nodal interpolation operator 


Tu = DON, ¢W)d,,¢ 


i=] 


The duality condition (1) yields Ipo; x =; j= 
1,...,”, and thus ae fee? 


Ikp= o YOE Py (3) 


Under Assumption 1, we obtain the property 


(gu) ° Fg = (Ensto) ° Fg 


i=] 


= DON £0, ¢ = leô 


isl 


that allows the estimation of the error on the reference ele- 
ment K and the transformation of the estimates to K. The 
interpolation operator I, is well defined for functions u 
that allow the evaluation of the functionals N, g. For exam- 
ple, if these functionals include the pointwise evaluation of 
derivatives up to order s, then I, is defined for functions 
from C*(K) c W"?(K) with s’ > s+d/p. If the func- 
tionals include the evaluation of integrals only, the required 
smoothness is correspondingly lower. 

The definition of FE, in (2) allows the introduction of 
the global interpolation operator I; by 


Aru)lg =Ig(ulg) VK eT 


With the basis {¢,}*, of FE, and the globalized set of 
nodal variables Nz} = {N,}%,, we can write 


Ny 
Lyu = SON, (u); (4) 


i=l 


Note again that I; acts only on sufficiently regular functions 
such that all functionals N; are well defined. 


Remark 2. We distinguish between the finite element 
space FE, of dimension N} and its N-dimensional sub- 
space V where boundary conditions are imposed, N,(u) = 
0 for ue V andi=N-+1,...,N,. Therefore, equation 
(4) is equivalent to 


N 
Iru = ON, W9; (5) 
i=l 


that means boundary conditions imply no particular diffi- 
culty for the analysis of the nodal interpolation operator. 
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Remark 3, There are also interpolation operators that 
cannot be treated in the framework developed in this 
section; for example, Demkowicz’s projection-based inter- 
polant (see Oden, Demkowicz, Westermann, and Rachow- 
icz, 1989), which is not defined by nodal variables but by 
a (d+ 1)-step procedure with Wo'*-projections on edges, 
faces, and volumes. 


3.2 Quasi-interpolation 


A drawback of nodal interpolation is the required regularity 
of the functions the operator acts on. For example, for 
Lagrangian elements, we need u €e W“P(K) with s' > 
d/p to obtain well-defined point values via the Sobolev 
embedding theorem. This assumption may fail even for 
simple problems like the Poisson problem with mixed 
boundary conditions in concave three-dimensional domains, 
where a r*-singularity with à close to 0.25 may occur. 
Moreover, an interpolation operator for W'*(<2)-functions 
is needed for the analysis of a posteriori error estimators 
and multilevel methods. 

The remedy is the definition of a quasi-interpolation 
operator 


N 
Qru = SON, (11,4); ©) 


i=] 


that means we replace the function u in (5) by the regular- 
ized functions Iu. The index i indicates that we may use 
for each functional N, a different, locally defined averaging 
operator F. 

For simplicity of the exposition, we restrict ourselves to 
Lagrangian finite elements, that is, the nodal variables have 
the form N;(u) = u(a'), where a! are nodes in the mesh. 
For quasi-interpolation of C}-elements, we refer to Girault 
and Scott (2002), and for the definition of quasi-interpolants 
for lowest-order Nédélec elements of first type and lowest- 
order Raviart-Thomas elements that fulfill the commuting 
diagram property (de Rham diagram), we refer to Schöber! 
(2001). 

Each node a, i=1,...,N, is now related to a sub- 
domain œ; C & and a finite-dimensional space P;. Differ- 
ent authors prefer different choices. We present two main 
examples. 


Example 4 (Clément operator) Clément (1975) consid- 
ers Py = P, in simplicial elements K with plane faces. 
Each node a’, i =1,..., N, is related to the subdomain 
®; := int supp $;, where ġ; is the corresponding nodal basis 
function and int stands for interior. The averaging operator 


T: L'(@;) > P (7) 


60 Interpolation in h-version Finite Element Spaces 
is then defined by 
fe-ne=0 veer 8) 
oO 


which is for v € L?(w,) the L?(c;)-projection into P,_,. 
This operator has the important property 


No=o Yo E€ Pe (9) 


One can choose the parameter £ in correspondence with k, 
for example, £ = k + 1, or in correspondence with the reg- 
ularity of u. For u € W"?(Q), the choice £ = min{s, k + 1} 
is appropriate. 

We analyze interpolation error estimates for the resulting 
operator Qy (see (6) in Section 6). Note that Qrvl|g is not 
only determined by v|, but also by v| 


ag’ 


og:= U o% (10) 


Remark 4. There are several modifications of the Clément 
operator from Example 4. Bernardi (1989) computes the 
average in some reference domain ,, which is chosen 
from a fixed set of reference domains. This idea is used 
to treat meshes with curved (isoparametric) simplicial and 
quadrilateral elements. The particular difficulty is that the 
transformation that maps Â; to œ; is only piecewise smooth. 
Bemardi and Girault (1998) and Carstensen (1999) mod- 
ify further and project into spaces of piecewise polynomial 
functions. 

A particularly simple choice œ; =int K; is used by 
Oswald (1994), who uses just one element K, € T with 
a' € K;. In this way, the technical difficulty mentioned 
above is avoided. 

Verfiirth (1999b) develops the new projection operator 
PE" (see the last paragraph in Section 4) and uses it in 
Verfiirth (1999a) as the averaging operator TI; in the def- 
inition of a quasi-interpolation operator. This modification 
allows for making explicit the constants in the error esti- 
mates. 


Remark 5. The quasi-interpolant Q,v satisfies Dirichlet 
boundary conditions by construction since the sum in (6) 
extends only over the N degrees of freedom of Vz- The nice 
property of I, mentioned in Remark 2 is not satisfied for 
the Clément operator, since N,(I];4), i= N+1,...,N,, 
is not necessarily zero for u € V. Consequently, the ele- 
ments adjacent to the Dirichlet boundary must be treated 
separately in the analysis of the interpolation error. An 
alternative is developed by Scott and Zhang (1990). 


Example 5 (Scott—-Zhang operator) The operator is in- 
troduced by Scott and Zhang (1990) similar to the Clément 
operator (see Example 4). In particular, the projector 
TI: L1(@,) — P,_, is also defined by (8). The essential 
difference is that w; (still satisfying a’ € @,) is allowed to 
be a (d — 1)-dimensional face of an element K,, and for 
Dirichlet boundary nodes, one chooses w, to be part of the 
Dirichlet boundary T p. In this way, we obtain N; (Mu) = 0 
if a' € Tp. The operator preserves even nonhomogeneous 
Dirichlet boundary conditions v|,, = g if g € FE7|,, and 
£=k+1 in (7). 

To be specific about the choice of œ; for ai ¢ Tp, we 
recall from Scott and Zhang (1990) that ©; = K € T if 
a’ € int K, and ©; 3 a' is a face of some element otherwise. 
Note that the face is not uniquely determined if a’ does not 
lie in the interior of a face or an element. For an illustration 
and an application of this operator in a context where the 
nodal interpolant is not applicable, we refer to Apel, Sandig, 
and Whiteman (1996). 

The operator can be applied to functions whose traces 
on (d — 1)-dimensional manifolds œ; are in L!(w,), that 
means, for u € WP (Q) with £ > 1 and p = 1, or with £ > 
1/p and p > 1. Consequently, it requires more regularity 
than the Clément operator, but, in general, less than the 
nodal interpolant. 

Finally, Verfiirth (1999a) remarks that in certain inter- 
polation error estimates that are valid for both the Scott- 
Zhang and the Clément operators, the constant is smaller 
for the Clément operator. 


4 THE DENY-LIONS LEMMA 


In this section, we discuss a result from functional analysis 
that turns out to be a classical approximation result, the 
Deny—Lions lemma (Deny and Lions, 1953/54), which 
is an essential ingredient of many error estimates in the 
finite element theory. It essentially states that the W°?(G)- 
seminorm is a norm in the quotient space W"?(G)/P,_.,. 
We formulate it for domains G of unit size diamG = 1. 


Lemma 1 (Deny and Lions) Ler the domain G C R¢, 
diam G = 1, be star-shaped with respect to a ball B C G, 
and let £ > 1 be an integer and p E€ [1, 00] real. For each 
u € WEP(G), there is a w € P,_; such that 


lu — wll weecay S Clulwera) ay 


where the constant C depends only on d, £, and y:= 
diam G/diam B = 1/diam B. 


One can find different versions of the lemma and its 
proof in the literature. Instead of giving one of them in full 


detail, we sketch some of them, hereby elucidating some 
important points. 

A classical proof is to choose a basis {og}iqic¢-1 Of Pe_y 
and to prove that |u| wee) + Djaj<e-1 [Fu()| defines a 
norm in WP (G), which is equivalent to ||x|| yeocg). Deter- 
mining w € P,_, by o,(¢— w) =0 for all a: ja] < 2-1 
leads to (11). For £ = 1, there is only one functional o to 
be used, typically o(u) := |G|7} Jou. For £> 2, one can 
take the nodal variables \V, of a simplicial Lagrange ele- 
ment (S, Py_,, Vs) with S C G (Braess, 1997) or o, (u) := 
|G|"1 fg D’u (Bramble and Hilbert, 1970). The proof is 
based on the compact embedding of W}? (G) in LP (G) and 
has the disadvantage that it only ensures that the constant 
is independent of u, but it can depend on all parameters, 
in particular, on the shape of G. The result is useful when 
applied only on a reference element G = Ê. Dobrowol- 
ski (1998) uses o,(u) := |G|! fẹ D*u as well and obtains 
with a different proof that the constant is independent of 
the shape of G (in particular also independent of y) but he 
needs the assumption that G is convex. 

Dupont and Scott (1980) choose w to be the averaged 
Taylor polynomial (also called Sobolev polynomial) 


Tgu(x) := f Tyu(x)o(y) dy € Py, 


where the Taylor polynomial of order £ — 1 evaluated at y 
is given by 


1 
True) := SY) = Duy) — y) e€ Pei 
jajs£-1 


and where B is the ball from the lemma, and $(-) is a 
smooth cut-off function with support B and Spe o = 1; see 
also Brenner and Scott (1994, Section 4.1). This polynomial 
has several advantages, among them the property 


D°Thu =TE "| D*u Yu e WB) (12) 


which will lead to simplifications later. This choice of w 
also allows the characterization of the constant in (11) as 
stated in Lemma 1. Moreover, several extensions can be 
made; so the domain may be generalized to a union of 
overlapping domains that are each star-shaped with respect 
to a ball, and the Sobolev index £ may be noninteger (see 
the original paper by Dupont and Scott, 1980). 

Verfiirth (1999b) defines in a recursive manner the pro- 
jector PE: H*(Q) > P,, 


£ x* 1 a 
u) = } = 
polu) =a fo 


laj=£ 
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—_ x 1 q 
Pew =w Fae S 2e- to, 


laj=k-1 
k=££-1,...,1, 

P$ := p% (u) (13) 
which also commutes with differentiation in the sense 
of (12) and allows to prove (11) for w= PẸ 'u with 
a constant C, depending only on £ and p € [2, oo]. The 
restriction p > 2 is outweighed by the fact that for convex 


Q the constant C does not depend on the parameter y := 
diam G/diam B. 


5 LOCAL ERROR ESTIMATES FOR THE 
NODAL INTERPOLANT 


5.1 Isotropic elements 


Interpolation error estimates can be proved on the basis of 
Lemma 1. To elaborate how the estimates depend on the 
element size hy := diam K < Cog, the isotropic element 
K is mapped to a domain of unit size. In simple cases, 
when P,_, C Px (this is, in general, not satisfied for 
isoparametric elements), one can just scale via 


x=hyk 


where x, € RÊ. This transformation maps K to a sim- 
ilar element K of unit size, diam = 1; it also maps 
P,_, to P1, u € WEP (K) to a e WS?(K) and the nodal 
interpolant I,u to Iz. So, we get with Izw = Ù for all 
Ù €P,_, and w according to Lemma 1 


[ju — Txt wmocgy = ne? hg” |i — Tg idl wns) 
| — Lgl yoocgy = |G — w) — Ig @ — Dwm) 
<|a- D| yor + lig = W)lwmr(R) 
<+ Le I wirk) Wo.r Ry) 
x la - Üllwerğ) 


< a + Le ll eo) wa.rcz) “i Clith weocey 


Scaling back, we obtain for m = 0,..., £ 


ju —Ixtlymrcey SCO + IL ll wee Ry we.ece)) 


l- 
x hx ™ Ju lyer) 
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The operator norm [[lg¢ll weoct)wer ky) “= SUPrewte(R) 
WE gH ll once) / lllw can be bounded by using 


[Let llwa.r ey = WALIN 


ial w(K) 


< L IN; & wil 3,8 lwn 


i=] 


It remains to be proved that ||d; glwm) = C and 
IN, g@I s Cll@lyweocgy- To ensure that these constants 
are independent of the form of K, Brenner and Scott 
(1994) transform (K, Pg, Ng) to the reference element 
(È, Pg Ng) which was introduced in Assumption 1. 

Other references suggest instead that the interpolation 
error estimates by transforming immediately to the refer- 
ence element Ê be proved. We will discuss this approach in 
the remaining part of this subsection. Recall from Section 2 
that we assume that for each element K € T there is a bijec- 
tive mapping F,:% € R +> x = Fx (%) € R°, which maps 
Ê to K. The following lemma provides transformation for- 
mulae for seminorms of functions if Fx is affine. 


Lemma 2. Let Fy(%) = Az +a be an affine mapping 
with K = F(R). fie w(K), then u=ii o Fx! € 
Ww"™4(K) and 


ltlwoacgy < CIKI oR" ll pmacey (14) 
Ifu € W°?(K), then û =u ° Fg € W®P (R) and 
lAlweoce S CIK Phi ltl yeocey (15) 


The constants depend on the shape and size of È. 


Proof. We follow Ciarlet (1978). By examining the affine 
mapping, we get Vi = ATVv and thus 


\whynacey < CIK [YATE Mal yymacay 
a - £ 
lal yeocg) S CIK ITVP Alls elweocxy 


The factor with the power of |K] comes from the Jacobi 
determinant of the transformation. This determinant is equal 
to the ratio of the areas of K and É. The norm of A 
can be estimated by considering the transformation of the 
largest sphere $ contained in Ê. For all ĉ E R? with 
l= = OR = = diam $, there are two points j,Z € Ê such that 
& =~ 2. By observing that |AX| = (4f +a) — (AZ + 
a)| = ly —z| S hg, we get 


AX 
|All, := sup ——- <— 


and analogously ||A7‘Il, </g/ox- This finishes the 
proof. oO 


Theorem 1. Let (K, Pg, Ng) be a reference element with 
Pi C Pg (16) 
Ng E (CRY an 
Assume that (K,Px,Ng) is affine equivalent to 


(Ê, Pg, Ng). Let ue W?(K) with £EN, p€ [l o0], 
such that 


WEP(R) > C (Ê), ie t>s +Ž (18) 
and let m € {0,...,€ — 1} and q € {1, 00] be such that 
wR) > W™ (R) (19) 
Then the estimate 
ju —Lgul macy < CIKI Phop Wel wercey (20) 


holds. 


Proof, From (17) and (18), we obtain |N; 0) < 
Chêlles S = Cilwen and, thus, with ||ọ; Alyas 
C, the boundedness of the interpolation operator, 


Lg Blot) = 


YIN, Oog 
tst 


waa (R) 
n 
< JOIN, g OU; glume ey $ CllPllwencay 
i=l 


where the constant depends not only on Ê, s,m, q, £, and 
p but also on Ng. The embedding (19) yields 


[Bl yma) = Cll werk 


Combining these estimates, choosing Ú e P,_; according 
to Lemma 1, and using Ô = Iç Ô due to (16), we get 


lê — Igûlwmac@ = a -b)- IgG = W) | ynat) 

< Cla — lhyencky 

< Clilyen ey QI) 
By transforming this estimate to K (using Lemma 2), we 
obtain the desired result. m) 


Note that this theorem restricts for simplicity to affine 
elements, but is valid not only for Lagrangian elements but 
also for other types, including Hermite elements. 


Corollary 1. For isotropic elements, we obtain, in partic- 
ular, 


lu — Igülyneg < CIK Yh" ulw OD 


Remark 6. Interpolation error estimates can also be 
proved for functions from weighted Sobolev spaces, for 
example, 

H?"(G) := {u e W™?(G) : r*DPu € L?(G) YB : IB] = 2} 


where r is the distance to some point x € G c RÊ, and 


X lr*D'ulizo 
IBl=2 


2 — 
lulz) = 


2 . 2 2 
llu lizac) = Welt yi2cey + lulz) 


Grisvard (1985) shows in Lemma 8.4.1.3 the analog to the 
Deny-Lions lemma, For each u € H*°(G) with a < 1, 
there is a w € P, such that 


llu ~ Wl p20) < C(G)lul araa) 
The interpolation error estimate 


ju — Ikulwiz) < Cho |u| peace) 
is then proved in Lemma 8.4.1.4 for triangles K. This result 
can be applied in the proof of mesh-grading techniques 
for singular solutions, where the singularity comes from 
comers in the domain Q C R?. 


Second derivatives of an affine transformation Fg vanish. 
This leads to the special structure of the relations (14) 
and (15), where no low-order derivatives of ĝ and u, 
respectively, appear on the right-hand sides. This is no 
longer valid for nonaffine transformations. In the case that 


IÔ Fg] < Cal! va: Jo| <£ (23) 
we obtain 
Blweece) £ CIK Ph hulyere) (24) 


which is weaker than (15), but is still sufficient for our 
purposes. The assumption (23) is satisfied when Fg differs 
only slightly from an affine mapping. 

However, Estimate (23) is not valid for general quadri- 
lateral meshes. Therefore, the theory has to be refined. For 
Pr = Qy, this case can be treated with a sharper version 
of the Deny—Lions lemma: for each u € WEP (G) there is 
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d Vp 
a ee (= 
i=l L’ (G) 


(see Bramble and Hilbert, 1971). For shape-regular ele- 
ments (in the sense of Ciarlet/Raviart or Girault/Raviart, 
see Example 3) one can then prove (22). 


a w €Q,_, such that 


atu 
ax} 


Remark 7. Some results are weaker for general shape- 
regular quadrilateral elements than for (at least asymptoti- 
cally) affine elements. For example, Amold, Boffi, and Falk 
(2002) have shown for quadrilateral serendipity elements 
(here Pg = Q; := P, @ span{zkt,, £,24}) that 


< Chet 


ju — Igülwnig) < || wers2¢Ky> m=0,1 


is sharp for general quadrilateral meshes, whereas for 
asymptotically parallelogram meshes, we get 


lu ~Tgtl nrc, < Ch uly m=0,1 


5.2 Anisotropic elements 


Anisotropic elements are characterized by a large aspect 


tatio Yg := hg/@g. Estimate (20) can also be reformulated 
as 


ee f= 
|u — Tcl ymacay ECK Pn eye ll weocey 


which means that the quality of this estimate deteriorates 
if m > 1 and yg > 1. Let us examine whether the reason 
for this deterioration is formed by the anisotropic element 
(indicating that anisotropic elements should be avoided) 
or by the sharpness of the estimates (indicating that the 
estimates should be improved). 


Example 6. Consider the triangle K with the nodes 
(—h, 0), (h, 0), and (0, sh) and interpolate the function 
u(x), x3) = x? in the vertices with polynomials of degree 
one. Then Igu = h? — e~!hx, and 


1/2 
lu — Irulmz) _ (= [Je + i) ‘ 


lelwe2cxy 4h?e 


=h ti = oh 

with c, —> oo for e —> 0 and c, > Cyg. We find that Esti- 
mate (20) can, in general, not be improved essentially and 
that (22) is not valid. Estimate (20) can be improved only 
slightly by investigating in more detail the transformation 
from Lemma 2 (see e.g. Formaggia and Perotto, 2001). 
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Example 7. Consider now the triangle with the nodes 
(0, 0), (A, 0), and (0, eh) and interpolate again the function 
u(x), x2) = x? in P}. We get Igu = hx, and 


172 
Iu — Txula _ (i) >. 


= h 
2eh2 [V12 


where the constant is independent of s. Estimate (22) is 
valid, although the element is anisotropic for small e. 


lulweacxy 


From the two examples, we can learn that the aspect ratio 
is not the right quantity to characterize elements that yield 
an interpolation error estimate of quality (22). Synge (1957) 
proved for triangles K and Pg = P} that 


lu ~ Tx ttl ypreocay < Chg lulwzog) 


with a constant that depends linearly on (cos(1/2)a)~!, 
where a is the maximal angle in K. The maximal angle 
condition was found (see also the comment in Example 2). 

We will now elaborate that Estimate (22) cannot be 
obtained from (21) by just treating the transformation Fy 
more carefully (compare also Apel, 1999a, Example 2.1). 
To do so, we consider again the triangle K from Example 7 
with Py = P,. The transformation to the reference element 
É with nodes (0, 0), (1, 0), and (0, 1) is done via x, = Aĉ}, 
x, = ehk,. Transforming (21) in the special case p = q = 
2,m = 1, £ = 2 leads to 


(| u-n? pju- y 
ax, LYK) dX, LXK) 
< cre ( au i 2 au j? 
I axi lre 8x, 9%» ltz) 
aul? 1/2 
+e TERY ) 
9x2 irz 


which can be simplified to 


ð (u — Igu) 
ax, 
(u — Igu) 
dxa 


< Ch|ulwg) 
LK) 


< Ce hlul yea (25) 
L(K) 


but the independence of e~! is not obtainable in (25). This 
factor could only be avoided if we proved on the reference 
element the sharper estimate 


2 


1/2 
jeie -ch 2 NEG 2 AY 
8B aay NOR OB P EA Nace 


The following lemma from Apel and Dobrowolski (1992) 
(see also Apel, 1999a, Lemma 2.2) reduces the proof to 
finding functionals o, with certain properties. 


Lemma 3. LetIz:C‘ (È ) > Px be a linear operator and 
assume that P, C Pg. Fixm, £ € Ny and p,q € [1,00] such 
thatOQ<m<2<k+1 and 


wE R) > LÊ) 26) 


Consider a multi-index y with |y| =m and define J := 
dim DYP. Assume that there are linear functionals O; 
E eee J, such that 


o; €(We™P(R)Y, Wiad... J (27) 
o,(D’Gi—Ipgi))=0 vi=1,...,J, 

Vii e C°(R): BYa e we) (28) 
Û e Py ando; (D0) =0 Vi=1,...,J, 

=> BYd=0 (29) 


Then the error can be estimated for all ii € C* (È ) with 
Drû e WE-™?(K) by 


ÊY = IRD) SCID @lyemargy (B0) 


Proof. The proof is based on two ingredients. First, we 
conclude from (12) and the Deny—Lions lemma 1 that 
i= Thi eP,_, C P, satisfies 


and 


|D%i oe Dylem = C|D il yeno) 31) 


Second, we see that Dib —Ipit)e BYP ,. Moreover, 
Sey |o;(-)| and Ij. ia are equivalent norms in DYP,. 
Therefore, we get with (28) and (27) 


F 
| DY ~ 1g) rece) < C Y oD Â — gi 


ix] 


J 
=C} loD — ô) 


i=l 


< CUD — â) liyer) 
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Consequently, we obtain with (26) and 681) 


1? — Tg) Ipoce) < ÊG — Dreh 
+ 1D’ — Alk) 
< CID (È — ô) lyen) 


E IDÂ | ye-mo(®y 


The creative part is now to find the functionals {o;}/_, 
with the properties (27)-(29}). 


Example 8. For Lagrangian elements and m = 1, the 
functionals are integrals over some lines (see Figure 3). 
One can easily check (28) and (29). The critical condition 
is (27), which is satisfied for £ = 2 only if d = 2 or p > 2. 
One can indeed give an example that shows that 


[D°@ — Tea) pace) < CIB Aline) 


does not hold for d = 3, p < 2 (see Apel and Dobrowolski, 
1992). 


Let us transform Estimate (30) to the element K = 
Fg(K). We see easily that if Fg (@) = Af +a with A= 
diag (Fy x»... ha g), we get 


IKA] g hfg lD U — Ipu)lre 
-1 
< CIK] h on Wie 


a a $ 
X ee hix ha'g ID” Yullzecey 


|o|=€—m 


Dividing by |K|~/4h}', ---AY#y and summing up over all 
y with |y] = m, we obtain 


ju —Tgttlmecgy < CIK [Va /P 


x DO Bile BF LD uynno (32) 


lo|=2—m 


Remark 8. A more detailed calculation shows that this 
result can also be obtained when the off-diagonal entries of 
A = [a; ;]f ,. are not zero but small, 


4,; Cminfh, ghpgh ij=1,...,d, ix} (3) 


EELDE To 


Figure 3. Functionals for Lagrangian elements. 


see Apel and Lube (1998). A geometrical description of 
two- and three-dimensional affine elements that satisfy (33) 
is given in terms of a maximal angle condition and a 
coordinate system condition in Apel (1999a). 


The situation is more difficult for nonaffine elements. 


Example 9, Consider the quadrilateral K with nodes 
(0, 0), (hi, 0), (Ay, Mz), (E, h3), O < E < h, < hy, which is 
an £-perturbation of the (affine) rectangle (0, hy) x (0, A). 
We have 
Fy) = (a rie) 
2 


Xy 


and as in the affine theory 


2 ata —Ipa) 
[lu — Igulmzg < IKI SO ay? | — 


i=l aå, LR) 
: aa 
S 1K? Yaz" | — 
Xi lwg 
izl i lwi2() 
var 2 
[a] < cma 
IT llre X1 ire) 
aâ i3 
zl . <C|K| MHS lulwzz 
X2 VLAR) 
but owing to 7x, /02,d%, = ~e # 0, we get only 
a? 
8 8%, | raky 
a 
< CIK! [ayn | ee 
X1 lwi2(K) axi lrx) 
and, thus, by using again € < hy < h, 
ju — Ix ulyrzcey 
<c Fh ðu € | du ) 
TO AE lax lwg) ha lax Mrz 


In Apel (1998), we concluded that we should allow only 
perturbations with £ < Ch,h», but later we found in Apel 
(1999a) a sharper estimate without the latter term: observing 
that P, € Px, we get for w € P; 

oe 


ju — Tg ulwi2cxy =|U—w)- Igu -— w)lwizg) 


se($n 


i=] 


£ 


T 
W12(K) h, 


d(u — w) 
ax, 


ðu 
ax; 
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By another Deny—Lions argument, comparing (31), we get 
for appropriate w 


2 2 
= a 
atu — w) 2 coh, u 
ax, LXK) fal 8x, 8%; lrag) 
such that 
ju — Igul < cya oi 
u — lguiwig) S ila 
i fat lax lwz 


can be proved for £ < Ch. 


The approach from the example, where a second 
Deny—Lions argument is used, holds also for more general 
quadrilateral elements K with straight edges (subparametric 
elements) and Pg = Q; (see Apel, 1999a). 

Summarizing this subsection, we can say that the aniso- 
tropic interpolation error estimate (32) can be proved for a 
large class of affine and subparametric elements (for details 
we refer to Apel, 19992). Also, estimates for functions 
from weighted Sobolev spaces have been proved; see Apel 
and Nicaise (1996, 1998) and Apel, Nicaise, and Schéberl 
(2001). The anisotropic interpolation error estimates are 
suited to compensate large partial derivatives D*u by an 
appropriate choice of the element sizes hy g, -++ ha,g in 
order to equilibrate the terms in the sum at the right-hand 
side of (32). The results can be applied to problems with 
anisotropic solutions; in particular, flow problems where 
first results on the resolution of all kinds of layers or 
shock fronts can be found, for example, in Peraire, Vahdati, 
Morgan, and Zienkiewicz (1987), Kornhuber and Roitzsch 
(1990), Zhou and Rannacher (1993), Zienkiewicz and Wu 
(1994), and Roos, Stynes, and Tobiska (1996). 


6 LOCAL ERROR ESTIMATES FOR 
QUASIL-INTERPOLANTS 


Recall from Section 3.2 that we restrict ourselves here to 
Lagrangian elements, that is, N; x (u) = u(a') where a’, 
i=1,...,n, are the nodes of E. The quasi-interpolants 
can be defined locally by 


Qgu = DON, g (M; u) x 


i=1 


with projectors TI; gu: L1(c, g) > Pe- and sets w; g that 
are defined differently by Clément and Scott/Zhang (see 
Examples 4 and 5, respectively). The local number of 
degrees of freedom that defines the operator correctly is 
denoted by #. For the Scott~Zhang operator, we can use 


fi = n. For the Clément operator, we also have # = n if K N 
Tp =Ø, but i <n if K touches the Dirichlet boundary. 
Let wg C Q be the interior of a union of finite elements 
with K C Gy and œw; g C mg, i =1,...,A; typically, one 
defines 
Üg = U K; 
KieT-KiNK #9 


We will prove error estimates in a uniform way for both 
operators and for triangles, tetrahedra, quadrilaterals, and 
hexahedra, but we restrict ourselves to affine isotropic 
elements. 

To bound the interpolation error u ~ Qgu, we need 
several ingredients. The first one is the inclusion P,_; C Pg 
which is satisfied for the affine elements mentioned above 
if Py =P, or Pg = Q; and £ < k + 1. Then we obtain 


Qgw=w Ww eP, (34) 


because of N; g (TI; g w) = N; g (w) for w € Poi- 
Since TI, gv is from a finite-dimensional space, we have 


-1/2 
IE; glz) £ Clg I I, Yle) 


Moreover, we get from the definition (8) with ọ = M, gu 
that 


2 
HI gllt =f DE 
WiK 
< lirio gY llegir) 
that means 


IN; g ŒH; g”) = IEL; x Ul r(e g) = Clo; cl Nell erwin 
(35) 


If œ; g is d-dimensional, we conclude 
IN; x Œ; x9) < CIK Tolli) 
To; x is (d — 1)-dimensional, namely, a face of an element 


K; C wg, we need to apply the trace theorem on the 
reference element. By transforming to K,, we obtain 


lelia £ Clog IKiI7 (lvlen + hg ly) 
IN; z (E xV) < CIKI (lvlen + Aglow) 
where we used the fact that adjacent isotropic elements are 


of comparable size: if K; N K; # Ø then hg, < Chg, with 
a constant C of moderate size, for example, C7 = 2. We 


are now able to bound the norm of Qg by 


2 N; x OL; xv)o;, g 


i=l 


lQgvlwneg) = 


wna(K) 


= >D IN; x AT; gv) lb; x lwms 


i=l 


£ 
SCIKI Y hh lolwnup IKI" 
j=0 


t 
< CIKI S hE" wwa 66) 
j=0 


with £>0 for the Clément operator and £21 
for the Scott-Zhang operator. We have also used 
the Hélder inequality lJullpicx) < Wow cs llllzecey = 
IK PZYP lull acy: 

By transforming the embedding theorem We? (KR) > 
W”™4(K), which we assume to hold, we get 


£ 
[viwa < CIKI SOE lie) BT) 
j=0 


where we used the formulae from Lemma 2 and hy < Co ra 

The next ingredient we need is a version of the 
Deny-Lions lemma 1. By the scaling x = hy € RÍ, we 
transform wg to &g, which is an isotropic domain with 
diameter of order 1. Thus, for any 7 e W4?(@,), there is 
a polynomial w e P,_, such that 


la ae Wi weewg) <í Clil wer) 
Scaling back, we obtain 


£ 


J hglu = wlw) £ Chy ltl werog) (38) 
j=0 


Finally, for Dirichlet nodes a’, we need a sharper estimate 
for |N; g (II; gu)| than (35). Consider an element K with 
a Dirichlet node a’. Then there is a face F; CT p at the 
Dirichlet part of the boundary with a’ € F,, and an element 
K; C œr with F, C aK;. By using the inverse inequality, 
Ms identity u|;, = 0, and the trace theorem, we get for 

21 


IN; x (T; gwi = IM, gula) < IM, gulro 
< CIF; THT; kulle 
= C\F,|7} le = Th, guller) 
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€ 
< CIK; 7"? È hh lu — T, xtlwsocey 
j=0 
Since (1) K; Ca x and œ; g is isotropic, (2) T; g is 
bounded in W/:? (w; g), and (3) TI; x Preserves polynomials 
w E P,_;, we obtain in analogy to (38) 


£ 
DDA — M; gulwig,) 
= 
i ; 
= Philu ~w)- TM; gu ake Wl wie) 
j=0 


Ẹ 
<C} hilu — wyso) 
= 


‘che 
< Chg llwyt) 


thus, we have 


IN; x Th Ku) SCIKI Png lulweog 69) 


Note that this estimate holds also for £ = 0 because of (35). 
With these prerequisites, we obtain the final result. 


Theorem 2. Let Tbe an isotropic triangulation. Assume 
that each element (F, Pg, Ng) is affine equivalent to 
(Ê, Pg, Ng) with Py, C Pg and Nz= {N, rliz where 
N; x(u) = u(a’) and a! are nodes, Let u e WP (wp), Wx 
from (10), £>0 for the Clément interpolant and £ > 1 
for the Scott-Zhang interpolant, p € [1, o0]. The numbers 
m e {0,...,£— 1} and q E [1,00] are chosen such that 
W5?(K) <> W4 (K). Then the estimate 


lu — Qetllymacxy < CIK VPA tll yen cane) 
holds. 
Proof. Consider first the case that Q7 is the Scott~Zhang 
operator or, if Qy is the Clément operator, that K does not 
touch the Dirichlet part of the boundary. With (34), (36), 
(37), we obtain for w € P,_, from (38) 
lu — Qk Ul ymacxy = |(u — w) — Qg (u — W)\ymacxy 


Sle WhymacKy + |Qg(u— W)lwmacKy 


£ 
<C|KMIVP Ag” Shi lu — wiwiron 
j=0 


By using (38), we obtain the desired result. 

In the remaining case, we consider the Clément operator 
and an element K with nodes a’, i =1,...,n, where the 
nodes with i = + 1,...,n are at the boundary. Then we 
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write 


n 
lu — Qgälwn.ag) £ ju — YON k(O; rog 
t=1 w4(K) 


F X IN; x O; gW) lo; xlwra) 


i=ñ4l 


The first term at the right-hand side has just been esti- 
mated; the remaining terms are bounded by (39) and 
1b; clwaacey < C|K|'/4hg”. Consequently, the assertion is 
also proved in this case. m] 


Remark 9. The proof given above extends to shape- 
regular quadrilateral elements (see Example 3), where 
Fg() € (Q,)? when m < 1. In this case, the relations (34), 
(36), (37), and (38) hold as well. In the same way, we 
can treat the Clément operator for hexahedral elements K 
with F,(-) € (Q,)°. The Scott-Zhang operator can also be 
treated if all faces are planar. 

For elements with curved faces, one can use a projec- 
tion operator Tl; on a reference configuration Ô (see e.g. 
Bemardi, 1989). 


Remark 10. For error estimates of the quasi-interpolants 
on anisotropic meshes, we refer to Apel (1999b). The main 
results are the following: 


1. In a suitable coordinate system (compare Remark 8), 
an anisotropic version of Theorem 2 holds for m = 0. 
This is obtained by a proper scaling. 

2. An example shows that both quasi-interpolation oper- 
ators are not suited for deriving anisotropic error esti- 
mates in the sense of (32) if m > 1. 

3. Modifications of the Scott—Zhang operators have been 
suggested such that error estimates of type (32) can 
be obtained under certain assumptions on the domain 
(‘tensor-product structure’). 


7 EXAMPLE FOR A GLOBAL 
INTERPOLATION ERROR ESTIMATE 


The effectivity of numerical methods for differential and 
integral equations depends on the choice of the mesh. 
Since singularities due to the geometry of the domain 
are known a priori, it is advantageous to adapt the finite 
element mesh 7 to these singularities. In this section, we 
define such meshes for a class of singularities and estimate 
the global interpolation error in the (broken) norm of the 
Sobolev space W™P (Q). Such an error estimate is used in 
the estimation of the discretization error of various finite 
element methods. Note that this generality of the norm 
includes estimates in L?(Q), L®(Q), and W}?(Q). 


Assumption 2. Let Q C R? be a two-dimensional polyg- 
onal domain with corners c,, j = 1,..., J. The solution u 
of an elliptic boundary value problem has, in general, sin- 
gularities near the corners Cj, that is, the solution can be 
represented by 


J 
u=u +} uj 


j=l 
with a regular part 
ug € WEP (Q) (40) 
and singular parts (corner singularities) u, satisfying 


|D%u,|< Cr? ""! var lal s£ (41) 
where r; = rj(x) := dist(x, c). The integer £ and the real 
numbers p and h,, j =1,..., J, are defined by the data. 


This assumption is realistic for a large class of elliptic 
problems, including those for the Poisson equation, the 
Lamé system, and the biharmonic equation. The numbers 
à; depend on the geometry of Q (in particular on the 
internal angles at c;), the differential operator, and the 
boundary conditions. For problems with mixed boundary 
conditions, there are, in general, further singular terms, but 
since they can also be characterized by (41), this poses no 
extra difficulty for the forthcoming analysis. 

We remark that there can be terms that are best described 
by 


[D%u,| < Crp! nr h 
These terms can be treated either by a slight modification 


of the forthcoming analysis or by decreasing the exponent 
in (41) slightly. Note that |Inr,|*/ < C,r-® for all e > 0. 


Assumption 3. Let Tbe a finite element mesh, which is 
described by parameters k and p; € (0, 1], j =1,..., J. 
We assume that the diameter hy of the element K € T 


relates to the distances rg j `= dist(K, c;), iE ERE: A 
according to 
hy Ch ifr, ; = 0 


hg Stir,” ifrgej>0 Yj=1,.., J (42) 


This assumption can be satisfied when isotropic elements 
are used, and when the elements in the neighborhoods U; 
of the corners c, satisfy 

Chl <hgsCh™  ifrgj=0 


Cihrg Shp SCyhrgy! ifrgj>0 (43) 


i 
| 
| 


j=1,...,J, and when Ch < hg <C,h for all other 
elements. The size of the neighborhoods U, of c, should 
be independent of 4 but small enough such that c; ¢ U, for 
ixéj. 

Let us prove that the number of elements of such meshes 
is of the order h~?, It is sufficient to show that the number 
of elements K C U; with c; ¢ K is bounded by Ch~?. By 
using fg 1 = Cgh?, and the relations for hy and rg, we get 


i= J, cang fi 


KCU GgK KcU, ugk 
sa” Sg f 1 
KCUcjgK G 
< Ch? mee 


J 
KcU ¢K E 


< Ch? if rte < Ch? 
U; 


since p; > 0. 

Finally, we remark that meshes with property (43) can 
be created in different ways. If the neighborhood U;, is a 
circular sector of radius R; one can just move the nodes of 
a uniform mesh according to the coordinate transformation 


1/pj 
oP (z) i 
R, R; 
(see e.g. Raugel, 1978; Oganesyan and Rukhovets, 1979; or 
Apel and Milde, 1996). A second possibility is to start with 
a uniform mesh of mesh size h and to split all elements 


recursively until (43) is satisfied; see Fritzsch (1990) or 
Apel and Milde (1996). 


Assumption 4. Let Vz be a finite element space corre- 
sponding to the triangulation 7. Let T: WP — Vz (with 
Aru)ig = 1g(ulg) for all K € 7) be the corresponding 
nodal interpolation operator. We assume that it permits the 
local interpolation error estimate 


ex 
lu —Ipttlwmocey < Chg" Ul weocey (44) 


with £, p from (40), and some m € {0,..., £ — 1}. 


Note that this assumption relates the regularity of ug 
to the polynomial degree. The estimate (44) is proved in 
Theorem 1 only if P,, C Pg. So, if the regularity is 
low, then a large polynomial degree does not pay; if the 
polynomial degree is too low, then the regularity (40)—(41) 
is not fully exploited. 
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Theorem 3. Let the function u, the mesh T, and the inter- 
polation operator Ly satisfy Assumptions 2, 3, and 4 respec- 
tively, Then the error estimate 


\/p 
(= lu — ctl) ach” (45) 


KET 


holds ifm < £, 


mao shy (46) 
yomi | 2 
beg T hato Wn 
7 2 
Wsi y yzt-2 (48) 


forall j =1,...,J. 


Note that condition (46) restricts m and p such that 
u; € W™P(Q) is ensured, 


lu; liyn = f u < Í POE < 09 
j (2) 2. 2 4. à 


jaļj=m 


if (4; —m)p > —2. With this argument, we see also that 
uje w4P(Q) if hy 2 2 — (2/p), that is, the function u, is 
as regular as uy in this case, and no refinement (p; = 1) is 
necessary in U;. 

The left-hand side of (45) is formulated with this bro- 
ken Sobolev norm in order to cover the case that Lu ¢ 
W™?(Q). Important applications are discretization error 
estimates for nonconforming finite element methods. If 
Iru e W™?(Q), then estimate (45) can be written in the 
form 


lu — Ig U|ym.r(a) < ch 


Proof. Consider the neighborhood U, of c) with j € 
{1,..., J} arbitrary but fixed. By Assumption 2, we have 


u; € WP), §=0,...,5, Ej 
and, therefore, 


1/p 
(z lu; — etl 


Kev; 


1/p 
= (x cif ade 


KEU; 


OR. isin TAG (49) 
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f J 
Since the same argument can be applied for &\ Uja1 Uj, 
it remains to be shown that (49) holds also for i = j. Note 
that we can assume that A < £—(2/p), since, otherwise, 
uj € WEP (Q) and no refinement is necessary. — 

If c;¢K, we estimate the interpolant simply by 


hj p 4 a 
eu jllrocey < Clujlrog $ Ch. Using the triangle in- 


equality, a direct computation of |u; |ym.ocx)» and the inverse 
inequality for [L%jlwm.ocx) Gf m > 0, the case m =0 is 
even direct), we get 


ju; — Tt ylymocxy S ej lyne + Wey lwar) 
\/p 
<C (f oe) + Chg KY cll ewe) 
K 
< CHE RY? < ChP < Chf 


where we have used the inequalities |K| < Ch2., (41), (42), 
and (47). Note that the nnmber of elements K with GE K 
is bounded by a constant that does not depend on h. So we 
get 


1/p 


P tem 
2 lu; — Ipu; lwn) <Ch (50) 
KEUj, EK 


Consider now an element K € U, with c; ¢ K, that is, 
with rg,j > 0. Then, u; € W*?(K) and we can use the 
interpolation error estimate (44). So we get 


f—m 
lu; — Tj! wmocx) < Chg ej lweea 


1 
tom, (€-m) (Hy) Oy-OP 1p 
S Che") x rj 
K 
Vp 
ig Dytt E-m) O=) 
< Ch (J. rj 


1/p 
fi Dj-m—~y(£—m)]p 
= Ch™ (i ri 


since ry x SV; in K. Hence, 


1/p 


P 
> lu; ~ Igt jlymecxy 
KeUj,cgK 


1/p 
< che-™ (/ famia) 
U; 


The integral on the right-hand side is finite if [\; — m — 
by (£ —m)]p > ~2, which is equivalent to (47). With (49) 
and (50), we conclude (45). 


Remark 11. The given proof is an improved version 
of a proof for a more specific function u in a paper by 
Fritzsch and Oswald (1988). In this paper, the authors 
also address the question of the optimal choice of pj. 
They obtain for p; = [kj — m + (2/p)/lé -m + (2/p)) 
the equidistribution of the element-wise interpolation error 
in the sense Inj! - Ign? lwmp) © const, 


Remark 12. The given proof has the advantage that 
it needs minimal knowledge from functional analysis. A 
more powerful approach is to use weighted Sobolev spaces 
(see Remark 6 for an example). The solutions of elliptic 
boundary value problems are often described by analysts in 
terms of different versions of such spaces; see, for example, 
the monographs by Grisvard (1985), Kufner and Sändig 
(1987), Dauge (1988), Nazarov and Plamenevsky (1994), 
or Kozlov, Maz’ya, and Roßmann (2001). For local and 
global interpolation error estimates for functions of such 
spaces see, for example, Grisvard (1985), Apel, Sändig, 
and Whiteman (1996), Apel and Nicaise (1998), or Apel, 
Nicaise, and Schéberl (2001). The advantage is that this 
approach extends to the three-dimensional case, whereas 
Assumption 2 is too simple to cover edge singularities. 


8 RELATED CHAPTERS 


(See also Chapter 4, Chapter 5, Chapter 6, Chapter 9, 
Chapter 17 of this Volume) 
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1 INTRODUCTION 


The finite element method is one of the most widely 
used techniques in computational mechanics. The math- 
ematical origin of the method can be traced to a paper 
by Courant (1943). We refer the readers to the arti- 
cles by Babuška (1994) and Oden (1991) for the his- 
tory of the finite element method. In this chapter, we 
give a concise account of the hA-version of the finite 
element method for elliptic boundary value problems in 
the displacement formulation, and refer the readers to 
the theory of Chapter 5 and Chapter 9 of this Vol- 
ume. 

This chapter is organized as follows. The finite element 
method for elliptic boundary value problems is based on the 
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Ritz—Galerkin approach, which is discussed in Section 2. 
The construction of finite element spaces and the a priori 
error estimates for finite element methods are presented 
in Sections 3 and 4. The a posteriori error estimates for 
finite element methods and their applications to adaptive 
local mesh refinements are discussed in Sections 5 and 6. 
For the ease of presentation, the contents of Sections 3 
and 4 are restricted to symmetric problems on polyhedral 
domains using conforming finite elements. The extension 
of these results to more general situations is outlined in 
Section 7. 

For the classical material in Sections 3, 4, and 7, we are 
content with highlighting the important results and point- 
ing to the key literature. We also concentrate on, basic 
theoretical results and refer the readers to other chap- 
ters in this encyclopedia for complications that may arise 
in applications. For the recent development of a posteri- 
ori error estimates and adaptive local mesh refinements 
in Sections 5 and 6, we try to provide a more compre- 
hensive treatment. Owing to space limitations many sig- 
nificant topics and references are inevitably absent. For 
in-depth discussions of many of the topics covered in 
this chapter (and the ones that we do not touch upon), 
we refer the readers to the following survey articles 
and books (which are listed in alphabetical order) and 
the references therein (Ainsworth and Oden, 2000; Apel, 
1999: Aziz, 1972; Babuška and Aziz, 1972; Babuška and 
Strouboulis, 2001; Bangerth and Rannacher, 2003; Bathe, 
1996; Becker, Carey and Oden, 1981; Becker and Ran- 
nacher, 2001; Braess, 2001; Brenner and Scott, 2002; 
Ciarlet, 1978, 1991; Eriksson et al., 1995; Hughes, 2000; 
Oden and Reddy, 1976; Schatz, Thomée and Wendland, 
1990; Strang and Fix, 1973; Szabó and Babuška, 1991; 
Verfiirth, 1996; Wahlbin, 1991, 1995; Zienkiewicz and Tay- 
lor, 2000). 
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2 RITZ-GALERKIN METHODS FOR 
LINEAR ELLIPTIC BOUNDARY 
VALUE PROBLEMS 


In this section, we set up the basic mathematical frame- 
work for the analysis of Ritz—Galerkin methods for linear 
elliptic boundary value problems. We will concentrate on 
symmetric problems. Nonsymmetric elliptic boundary value 
problems will be discussed in Section 7.1. 


2.1 Weak problems 


Let & be a bounded connected open subset of the Euclidean 
space R with a piecewise smooth boundary. For a positive 
integer k, the Sobolev space H* (9) is the space of square 
integrable functions whose weak derivatives up to order k 
are also square integrable, with the norm 


1/2 
5 / 
1n(2) 


1/2 


alv 
ox” 


A 
lvla = (z 


laisk 


The seminorm ( Sj (@*v/8x"){12,a)) will be deno- 
ted by |ul pq). We refer the readers to Netas (1967), 
Adams (1995), Triebel (1978), Grisvard (1985), and Wloka 
(1987) for the properties of the Sobolev spaces. Here we 
just point out that || || zz is a norm induced by an inner 
product and H*(Q) is complete under this norm, that is, 
H*(Q) is a Hilbert space. (We assume that the readers are 
familiar with normed and Hilbert spaces.) 

Using the Sobolev spaces we can represent a large class 
of symmetric elliptic boundary value problems of order 2m 
in the following abstract weak form: 

Find u € V, a closed subspace of a Sobolev space H” (Q), 
such that 


a(u, v) = F (v) YveV (1) 
where F: V — R is a bounded linear functional on V and 
a(-,-) is a symmetric bilinear form that is bounded and 
V-elliptic, that is, 

la(v,, v,)| < Cilu ano lvl aao Yv wEV (2) 


alv, v) > Calul? YveV 8) 


Remark 1. We use C, with or without subscript, to 
represent a generic positive constant that can take different 
values at different occurrences. 


Remark 2. Equation (1) is the Euler-Lagrange equa- 
tion for the variational problem of finding the minimum 
of the functional v +> alv, v) — F (v) on the space V. In 
mechanics, this functional often represents an energy and its 
minimization follows from the Dirichlet principle. Further- 
more, the corresponding Euler-Lagrange equations (also 
called first variation) (1) often represent the principle of 
virtual work. 


It follows from conditions (2) and (3) that a(., -) defines 
an inner product on V which is equivalent to the inner 
product of the Sobolev space H” (N). Therefore the exis- 
tence and uniqueness of the solution of (1) follow immedi- 
ately from (2), (3), and the Riesz Representation Theorem 
(Yosida, 1995; Reddy, 1986; Oden and Demkowicz, 1996). 

The following are typical examples from computational 
mechanics. 


Example 1. Let a(., -) be defined by 
a(v;, vz) =| Vv, - Vu, dx (4) 
Q 
For f € L,(&2), the weak form of the Poisson problem 


a 
—Au=f on Q, u=0 on T, = =0 on 92\F 
n 
(5) 
where I is a subset of ðQ with a positive (d — 1)- 
dimensional measure, is given by (1) with V = {v € 
H}(Q):v|,, = 0} and 


Fw) = | fod =. © 


For the pure Neumann problem where T =Ø, since 
the gradient vector vanishes for constant functions, an 
appropriate function space for the weak problem is V = 
{v € H (Q): w, Dro = 0} 

The boundedness of F and a(-,-) is obvious and the 
coercivity of a(-,-) follows from the Poincaré~Friedrichs 
inequalities (Nečas, 1967): 


Hla. <c (Waa +{f vas) Yve HQ) (1) 


lolza <C (Gere ca If) Vue H! (Q) (8) 


Example 2. Let QC Rê (d = 2,3) and v € [H!(Q)]? be 
the displacement of an elastic body. The strain tensor €(v) 
is given by the d x d matrix with components 


1 au, dv; 
€j) = 3 (2 F ax. (9) 


i 
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and the stress tensor o(v) is the d x d matrix defined by 
o(v) = 2ne(v) + a (div v) ô (10) 


where 8 is the d x d identity matrix and u > 0 and à > 0 
are the Lamé constants. 
Let the bilinear form a(-, -) be defined by 


d 
a(¥1, V2) =f Day odeyeoax = f oc):«0o,)ar 


ij=1 
(11) 
Forf € IL (QF, the weak form of the linear elasticity 
problem (Ciarlet, 1988) 


div [c({u) =f mQ, w=0 onl 
[ouj =0 ondQ\r (12) 


where I is a subset of 92 with a positive (d — 1)- 
dimensional measure, is given by (1) with V={ve 
(H'(Q)}4: v| = 0} and 


PO)= [fosu (13) 


For the pure traction problem where I = Ø, the strain 
tensor vanishes for all infinitesimal rigid motions, i.e., dis- 
placement fields of the form m =a + px, where a € Rf, 
p is ad xd antisymmetric matrix and x = (x;,...,xy)' 
is the position vector. In this case an appropriate function 
space for the weak problem is V = {v € [H'()]: fo V x 
vdx =0 = fo vdx}. 

The boundedness of F and a(., -) is obvious and the coer- 
civity of a(-, -) follows from Korn’s inequalities (Friedrichs, 
1947; Duvaut and Lions, 1976; Nitsche, 1981) (see Chap- 
ter 2, Volume 2): 


lella $C (rolne + ji vasl) Yv e [H(Q)1', 


(14) 
lolz <C (rolne + [vy x vds [vw ) 


Yv e [H 0]? (15) 


+ 


Example 3. Let Q be a domain in R? and the bilinear 
form a(-, -) be defined by 


a(v,, v) = a [anav +(1—0) 


( au, 3v, 8u Pu, 3v z2) 


Ox,9x, 0x,0x, dx? 3x? ax? ax? 


(16) 
where o e (0, 1/2) is the Poisson ratio. 


For f e L,(Q), the weak form of the clamped plate 
bending problem (Ciarlet, 1997) 


a 
Au=f ong, u= = =0 ondQ 07 
n 


is given by (1), where V = {v € H?(Q): v = ðv/ðn = 0 
on ðN} = H? (Q) and F is defined by (6). For the simply 
supported plate bending problem, the function space V is 
{v € H?(Q): v = 0 on 39) = H? (Q) N HED). 

For these problems, the coercivity of a(-,-) is a con- 
sequence of the following Poincaré—Friedrichs inequality 
(Nečas, 1967): 


lela < Clue Yve HOQ)N ABQ) (18) 


Remark 3. The weak formulation of boundary value 
problems for beams and shells can be found in Chapter 8, 
this Volume and Chapter 3, Volume 2. 


2.2 Ritz—Galerkin methods 


In the Ritz—Galerkin approach for (1), a discrete problem 
is formulated as follows. 
Find & e V such that 


a(i,i)= FS) VieV (19) 


where y, the space of trial/test functions, is a finite- 
dimensional subspace of V. 
The orthogonality relation 


au—i#,5)=0 Vie’ (20) 
follows by subtracting (19) from (1), and hence 


lu — tall, = inf lu — all. (21) 


where ||- ||, = (aC, -))!/*. Furthermore, (2), (3), and (21) 
imply that 


c12 
lu- aa) < (E) influ- üla C2 
Cy deV 


that is, the error for the approximate solution & is quasi- 
optimal in the norm of the Sobolev space underlying the 
weak problem. 

The abstract estimate (22), called Cea’s lemma, reduces 
the error estimate for the Ritz—Galerkin method to a prob- 
lem in approximation theory, namely, to the determination 
of the magnitude of the error of the best approximation of 
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u by a member of V. The solution of this problem depends 
on the Tegularity (smoothness) of u and the nature of the 
space V. 

One can also measure u — # in other norms. For exam- 
ple, an estimate of {lu — i|| z(Q) Can be obtained by the 
Aubin—Nitsche duality technique as follows. Let w € V be 
the solution of the weak problem 


a(v, w= fu- iwa YveV (23) 


Then we have, from (20), (23), and the Cauchy—Schwarz 
inequality, 


u — HZ (2) =a(u — i, w) = a (u — ü, w — Ù) 


< Collu — üg llw — õlg Vie 


which implies that 


ef — Dll orm 
lu -ülo < G ( ing MOT Plane 
L2(Q) 2 me = 
sey lu — üla 


) lu — iM imo) 

(24) 
In general, since w can be approximated by members of v 
to high accuracy, the term inside the bracket on the right- 
hand side of (24) is small, which shows that the L, error 
is much smaller than the H” error. 

The estimates (22) and (24) provide the basic a priori 
error estimates for the Ritz—Galerkin method in an abstract 
setting. 

On the other hand, the error of the Ritz—Galerkin method 
can also be estimated in an a posteriori fashion. Let the 
computable linear functional (the residual of the approxi- 
mate solution #) R: V > R be defined by © 


R(v) = a(u — ù, v) = F (v) — a, v) (25) 
The global a posteriori error estimate 


np RO! 


= 26 
C vev Ue [I ano a) 


lu — iil] anco) < 


then follows from (3) and (25). 

Let D be a subdomain of Q and Hg’ (D) be the subspace 
of V whose members vanish identically outside D. It 
follows from (25) and the local version of (2) that we also 
have a local a posteriori error estimate: 


np IR@)| 


lu — Ell gao) = = 
Cy vengo) Wollan) 
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The equivalence of the error norm with the dual norm of 
the residual will be the point of departure in Section 5.1.2 
(cf. (70). 


2.3 Elliptic regularity 


As mentioned above, the magnitude of the error of a 
Ritz—Galerkin method for an elliptic boundary value prob- 
lem depends on the regularity of the solution. Here we give 
a brief description of elliptic regularity for the examples in 
Section 2.1. 

If the boundary dQ is smooth and the homogeneous 
boundary conditions are also smooth (i.e. the Dirichlet and 
Neumann boundary condition in (5) and the displacement 
and traction boundary conditions in (12) are defined on 
disjoint components of 3N), then the solution of the elliptic 
boundary value problems in Section 2.1 obey the classical 
Shift Theorem (Agmon, 1965; Nečas, 1967; Gilbarg and 
Trudinger, 1983; Wloka, 1987). In other words, if the right- 
hand side of the equation belongs to the Sobolev space 
H®(Q), then the solution of a 2m-th order elliptic boundary 
problem belongs to the Sobolev space H2"+£(Q). 

The Shift Theorem does not hold for domains with 
piecewise smooth boundary in general. For example, let 
Q be the L-shaped domain depicted in Figure 1 and 


u(x) = o(r) 72? sin E f= 5)) (28) 


where r = (xj + x3)/? and 0 = arctan(x,/x,) are the polar 
coordinates and is a smooth cnt-off function that equals 
1 for O <r < 1/2 and O for r > 3/4. It is easy to check 
that u € Hj(Q) and —Au € CQ). Let D be any open 
neighborhood of the origin in Q. Then u € H*(Q\D) 
but u g H?(D). In fact u belongs to the Besov space 
B33 (D) (BabuSka and Osborn, 1991), which implies that 
u € H9/3-*(D) for any € > 0, butu ¢ H5/3(D) (see Triebel 
(1978) and Grisvard (1985) for a discussion of Besov 
spaces and fractional order Sobolev spaces). A similar situ- 
ation occurs when the types of boundary condition change 
abruptly, such as the Poisson problem with mixed bound- 
ary conditions depicted on the circular domain in Figure 1, 
where the homogeneons Dirichlet boundary condition is 
assnmed on the upper semicircle and the homogeneous 
Neumann boundary condition is assumed on the lower 
semicircle. 

Therefore (Dauge, 1988), for the second (respectively 
fourth) order model problems in Section 2.1, the solution 
in general only belongs to H!+*(Q) (respectively H?+*(Q)) 
for some a € (0,1] even if the right-hand side of the 
equation belongs to C™°(£2). 


61,1) (0.1) u=0 
(0,0) (1,0) 
(-1,-1) u=0 (4-1) gu/an=0 


Figure 1. Singular points of two-dimensional elliptic boundary 
value problems. 


For two-dimensional problems, the vertices of Q and the 
points where the boundary condition changes type are the 
singular points (cf. Figure 1). Away from these singular 
points, the Shift Theorem is valid. The behavior of the 
solution near the singular points is also well understood. 
If the right-hand side function and its derivatives vanish 
to snfficiently high order at the singular points, then the 
Shift Theorem holds for certain weighted Sobolev spaces 
(Nazarov and Plamenevsky, 1994; Kozlov, Maz’ya and 
Rossman, 1997, 2001), Alternatively, one can represent the 
solution near a singular point as a sum of a regular part 
and a singular part (Grisvard, 1985; Dauge, 1988; Nicaise, 
1993), For a 2m-th order problem, the regular part of the 
solution belongs to the Sobolev space H?"**(Q) if the 
right-hand side function belongs to H *(Q), and the singular 
part of the solution is a linear combination of special 
functions with less regularity, analogous to the function 
in (28). 

The situation in three dimensions is more complicated 
due to the presence of edge singularities, vertex singular- 
ities, and edge-vertex singularities. The theory of three- 
dimensional singularities remains an active area of research. 


3 FINITE ELEMENT SPACES 


Finite element methods are Ritz—Galerkin methods where 
the finite-dimensional trial/test function spaces are con- 
structed by piecing together polynomial functions defined 
on (small) parts of the domain Q. In this section, we 
describe the construction and properties of finite element 
spaces. We will concentrate on conforming finite elements 
here and leave the discussion of nonconforming finite ele- 
ments to Section 7.2. 


3.1 The concept of a finite element 
A d-dimensional finite element (Ciarlet, 1978; Brenner 


and Scott, 2002) is a triple (K, Pg. Ng), where K is a 
closed bounded subset of R? with nonempty interior and 
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a piecewise smooth boundary, Px is a finite-dimensional 
vector space of functions defined on K and My is a basis of 
the dual space P’y. The function space Py is the space of 
the shape functions and the elements of Mg are the nodal 
variables (degrees of freedom). 

The following are examples of two-dimensional finite 
elements. 


Example 4 (Triangular Lagrange Elements) Let K bea 
triangle, Pg be the space P, of polynomials in two variables 
of degree < n, and let the set N; x consist of evaluations of 
shape functions at the nodes with barycentric coordinates 
Ay =i/n, dy = j/n and dh, = k/n, where i, j,k are non- 
negative integers and i+ j+k=n. Then (K, Px, Nx) 
is the two-dimensional P, Lagrange finite element. The 
nodal variables for the P}, P,, and P} Lagrange elements 
are depicted in Figure 2, where e (here and in the fol- 
lowing examples) represents pointwise evaluation of shape 
functions. 


Example 5 (Triangular Hermite Elements) Let K be a 
triangle. The cubic Hermite element is the triple (K, P}, 
Ng) where Mg consists of evaluations of shape functions 
and their gradients at the vertices and evaluation of shape 
functions at the center of K. The nodal variables for the 
cubic Hermite element are depicted in the first figure in 
Figure 3, where o (here and in the following examples) rep- 
resents pointwise evaluation of gradients of shape functions. 

By removing the nodal variable at the center (cf. the 
second figure in Figure 3) and reducing the space, of shape 
functions to 


3 
fv e P3:6u(c) -29 vp) 


i=l 


3 
+ (Pp) Piz) = ojo P,) 


isl 


where p; (i = 1,2, 3) and c are the vertices and center of 
K respectively, we obtain the Zienkiewicz element. 

The fifth degree Argyris element is the triple (K, P;, Ng) 
where Mg consists of evaluations of the shape functions and 
their derivatives up to order two at the vertices and evalua- 
tions of the normal derivatives at the midpoints of the edges. 
The nodal variables for the Argyris element are depicted 
in the third figure in Figure 3, where © and + (here and 


AAA 


Figure 2. Lagrange elements. 
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Figure 3. Cubic Hermite element, Zienkiewicz element, fifth 
degree Argyris element and Bell element. 


in the following examples) represent pointwise evaluation 
of second order derivatives and the normal derivative of the 
shape functions, respectively. 

By removing the nodal variables at the midpoints of the 
edges (cf. the fourth figure in Figure 3) and reducing the 
space of shape functions to {v € Ps: (8v/an)|, € P(e) for 
each edge e}, we obtain the Bell element. 


Example 6 (Triangular Macro Elements) Let K be 
a triangle that is subdivided into three subtriangles by 
the center of K, Px be the space of piecewise cubic 
polynomials with respect to this subdivision that belong 
to C!(K), and let the set Ngk consist of evaluations of 
the shape functions and their first-order derivatives at the 
vertices of K and evaluations of the normal derivatives of 
the shape functions at the midpoints of the edges of K. Then 

*(K, Px, Nx) is the Hsieh—Clough~Tocher macro element. 
The nodal variables for this element are depicted in the first 
figure in Figure 4. 

By removing the nodal variables at the midpoints of 
the edges (cf. the second figure in Figure 4) and reducing 
the space of shape functions to {v € C!(K): v is piecewise 
cubic and (8v/8n)|, € P,(e) for each edge e}, we obtain 
the reduced Hsieh-Clough~Tocher macro element. 


Example 7 (Rectangular Tensor Product Elements) 

Let K be the rectangle Lay, b1] x fay, by}, Pg be the 
space spanned by the monomials xix} for 0 <i, j <n, 
and the set Vg consist of evaluations of shape functions 


AA 


Figure 4. Hsich~Clough—Tocher element and reduced Hsieh- 
Clough-Tocher element. 
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Figure 5. Tensor product elements. 
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Figure 6. Q, quadrilateral elements, 


at the nodes with coordinates (a, +i(b, — a,)/n, a, + 
Jb, — a,)/n) for 0 <i, j <n. Then (K, Px, Ng) is the 
two-dimensional Q, tensor product element, The nodal 
variables of the Q,, Q, and Q, elements are depicted in 
Figure 5, 


Example 8 (Quadrilateral Q, Elements) Let K be 
a convex quadrilateral; then there exists a bilinear map 
(1%) > B, x2) = (a) +x texa + dixia, ay + 
baxı + CX, + daxx) from the biunit square S with ver- 
tices (+1, +1) onto K. The space of shape functions is 
defined by v € Px if and only if vo B € Q, and Nx con- 
sists of pointwise evaluations of the shape functions at the 
nodes of K corresponding under the map B to the nodes of 
the Q, tensor product element on S. The nodal variables 
of the Q,, Q, and Q, quadrilateral elements are depicted 
in Figure 6. 


Example 9 (Other Rectangular Elements) Let K be the 
rectangle [a,, b,] x [c], d;]; 


4 


Py = fo E Q5:4v(c) + Yu) 


i=l 


4 
-29 v(m) = ojo P3) 


is! 


where the p,’s are the vertices of K, the m,’s are the 
midpoints of the edges of K and c is the center of K j 
and Mx consist of evaluations of the shape functions at the 
vertices and the midpoints (cf. the first figure in Figure 7). 
Then (K, Px, Nx) is the 8-node serendipity element. 

If we take Px to be the space of bicubic polynomials 
spanned by x}x9 for 0 <i, j < 3 and Ng to be the set con- 
sisting of evaluations at the vertices of K of the shape func- 
tions, their first-order derivatives and their second-order 


HT 


Figure 7. Serendipity and Bogner—Fox—Schmit elements. 
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mixed derivatives, then we have the Bogner~Fox—Schmit 
element. The nodal variables for this element are depicted 
in the second figure in Figure 7, where the tilted arrows 
represent pointwise evaluations of the second-order mixed 
derivatives of the shape functions. 


Remark 4. The triangular P, elements and the quadri- 
lateral Q, elements, which are suitable for second order 
elliptic boundary value problems, can be generalized to 
any dimension in a straightforward manner. The Argyris 
element, the Bell element, the macro elements, and the 
Bogner~Fox-—Schmit element are suitable for fourth-order 
problems in two space dimensions. 


3.2 Triangulations and finite element Spaces 


We restrict Q C R? (d = 1, 2, 3) to be a polyhedral domain 
in this and the following sections. The case of curved 
domains will be discussed in Section 7.4. 

A partition of Q is a collection P of polyhedral subdo- 
mains of Q such that 


a=| JD and Dnp'=g if D, D' € P, D # D' 
DEP 


where we use Q and D to represent the closures of Q 
and D. 

A triangulation of Q is a partition where the intersection 
of the closures of two distinct subdomains is either empty, 
a common vertex, a common edge or a common face. 
For d = 1, every partition is a triangulation. But the two 
concepts are different when d > 2. A partition that is not a 
triangulation is depicted in the first figure in Figure 8, where 
the other three figures represent triangulations. Below. we 
will concentrate on triangulations consisting of triangles or 
convex quadrilaterals in two dimensions and tetrahedrons 
or convex hexahedrons in three dimensions. 

The shape regularity of a triangle (or tetrahedron) D can 
be measured by the parameter 


diam D 


y(D) = = — 
diameter of the largest ball in D 


(29) 


which will be referred to as the aspect ratio of the trian- 
gle (tetrahedron). We say that a family of triangulations of 


triangles (or tetrahedrons) {Z:i € I} is regular (or nonde- 
generate) if the aspect ratios of all the triangles (tetrahe- 
drons) in the triangulations are bounded, that is, there exists 
a positive constant C such that 


y(D)<C foral DET and ier 


The shape regularity of a convex quadrilateral (or hexa- 
hedron) D can be measured by the parameter y(D) defined 
in (29) and the parameter 


o(D) = max {fet €; and e, are any two edges of >| 
2 

(30) 
We will refer to the number max(y(D), o(D)) as the aspect 
ratio of the convex quadrilateral (hexahedron), We say 
that a family of triangulations of convex quadrilaterals (or 
hexahedrons) {7;:i € 7} is regular if the aspect ratios of all 
the quadrilaterals in the triangulations are bounded, that is, 
there exists a positive constant C such that 


y(D),o(D)<C foral De T; and iel 


A family of triangulations is quasi-uniform if it is regular 
and there exists a positive constant C such that 


h; < C diam D VDeT, iel 81) 


where h; is the maximum of the diameters of the subdo- 
mains in Z. 


Remark 5. For a triangle or a tetrahedron D, a lower 
bound for the angles of D can lead to an upper bound for 
y(D) (and vice versa). Therefore, the regularity of a family 
of simplicial triangulations (ie. triangulations consisting of 
triangles or tetrahedrons) is equivalent to the following 
minimum angle condition: There exists 8, > 0 such that 
the angles of the simplexes in all the triangulations 7; are 
bounded below by 6,. 


Remark 6. A family of triangulations obtained by suc- 
cessive uniform subdivisions of an initial triangulation is 
quasi-uniform, A family of triangulations generated by a 
local refinement strategy is usually regular but not quasi- 
uniform. 


Let 7 be a triangulation of , and a finite element 


7 SE 


Figure 8. Partitions and triangulations. 


(D, Pp: Np) be associated with each subdomain D ET. 
We define the corresponding finite element space to be 


FEr= {v € LQ): v5 =v E Py YDET, and 


Ups Up share the same nodal values on D ND} 


(32) 
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We say that FE; is a C” finite element space if FE; C 
C" (Q). For example, the finite element spaces constructed 
from the Lagrange finite elements (Example 4), the tensor 
product elements (Example 7), the cubic Hermite element 
(Example 4), the Zienkiewicz element (Example 4) and 
the serendipity element (Example 9) are C? finite element 
spaces, and those constructed from the quintic Argyris ele- 
ment (Example 5), the Bell element (Example 5), the macro 
elements (Example 6) and the Bogner—Fox-Schmit ele- 
ment (Example 9) are C? finite element spaces. 

Note that a C” finite element space is automatically 
a subspace of the Sobolev space H ™+1(Q) and therefore 
appropriate for elliptic boundary value problems of order 
27r4+1). 


3.3 Element nodal interpolation operators and 
interpolation error estimates 


Let (K, Pg, Ng) be a finite element. Denote the nodal 
variables in NV, by N,,...,N, (n = dim Py) and the dual 


basis of Px by $1,..-+,, that is, 
wae lh SoS 
N@)=8y=l9 if igj 


Assume that ¢+> N,(¢) is well-defined for ¢ e H*(K) 
(where s is a sufficiently large positive number), then 


we can define the element nodal interpolation operator 
Tp: H°(K) > Px by 


Txt =}, ©), (33) 


j=l 
Note that (33) implies 
Tyv=v Ve Py (34) 


For example, by the Sobolev embedding theorem 
(Adams, 1995; Netas, 1967; Wloka, 1987; Gilbarg 
and Trudinger, 1983), the nodal interpolation operators 
associated with the Lagrange finite elements (Example 4), 
the tensor product finite elements (Example 7), and 
the serendipity element (Example 9) are well-defined on 
H*(K) fors > Lif K c R° and for s > 3/2 if K c R?. On 
the other hand the nodal interpolation operators associated 
with the Zienkiewicz element (Example 5) and the macro 
elements (Example 6) are well-defined on H° (K) for s > 2, 
while the interpolation operators for the quintic Argyris 
element (Example 5), the Bell element (Example 5) or 
the Bogner~Fox~—Schmit (Example 9) are well-defined on 
HY (K) for s > 3. 

The error of the element nodal interpolation operator for a 
triangular (tetrahedral) or convex quadrilateral (hexagonal) 


element (K, Px, Ng) can be controlled in terms of the 
shape regularity of K. Let K be the image of K under 
the scaling map 


x > H(x) = (diam K)~!x (35) 


Then Ë isa domain of unit diameter and we can define a 
finite element (K, Pg, Ng) as follows: (i) ô € Pe if and 
only if ĝo H €e Py, and (ii) N ENG if and only if the 
linear functional v > N (v o H7 1) on Px belongs to Nx. 
It follows that the dual basis $,, gost 6, of Pe is related 
to the dual basis ,,...,, of Px through the relation 
poH = p, and (33) iiiplies that 


(xt) oH! = gto H7) (36) 


for all sufficiently smooth functions ¢ defined on K. More- 
over, for the functions Ê and ¢ related by t(x) = C(H(x)), 
we have 


Clee) = diam K)” |b Fea) (37) 


where d is the spatial dimension. 
Assuming that P¢ > P, (equivalently Py, 2 P,,), we 
have, by (34), 


È- gel = IE- p)— RÊ- lma 
< ANGlm,s l- plaea YPE Pn 


where [IT ¢llm.s is the norm of the operator Hg: H'(K) > 
H"™(K), and hence 


If — g$ lang < 2ng lmss inf E E- Plae (38) 


Since K is convex, the following estimate (Verfürth, 
1999) holds provided m is the largest integer strictly less 
than s: 

inf I- Plea SCoalblaae VEE A(R) 69) 
where the positive constant C, , depends only on s and d. 

Combining (38) and (39) we find 


R- Rila S2C, al elm.slElae Vb e H°(K) 
40 
We have therefore reduced the error estimate m3 
element nodal interpolation operator to an estimate of 
liTlelln,s: Since diam K = 1, the norm IAEl, is a con- 
stant depending only on the shape of K (equivalently of 
K), if we considered s and m to be fixed for a given type 
of element. 
For triangular elements, we can use the concept of affine- 
interpolation-equivalent elements to obtain a more concrete 
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description of the dependence of ||¢ll,,,, on the shape 
of &. A d-dimensional nondegenerate affine map is a 
map of the form x» Ax +b where A is a nonsingular 
d x d matrix and b e R¢. We say that two finite elements 
(Ky, Px,» Nga) and (K3, Px,, Ng) are affine-equivalent if 
(i) there exists a nondegenerate affine map ® that maps K, 
onto Kp, (ii) v € Px, if and only if vob € Px, and (iii) 


(x,t) 0 ® = Ilk, €o®) (41) 


for all sufficiently smooth functions ¢ defined on K,. For 
example, any triangular elements in one of the families 
(except the Bell element and the reduced Hsieh-Clough- 
Tocher element) described in Section 3.1 are affine- 
interpolation-equivalent to the corresponding element on 
the standard simplex = with vertices (0, 0), (1, 0) and (0, 1). 

Assuming (K , Pz, Ng) (or equivalently (K, Px Nx)) 
is affine inerpolatan-eausalent to the element (S, Ps, Ng) 
on the standard simplex, it follows from (41) and the chain 
rule that 


Tells < CV sllm,s (42) 


where the positive constant depends only on the Jacobian 
matrix of the affine map Ê: S > K and thus depends only 
on an upper bound of the parameter y(K) (cf. (29)) which 
is identical with y(K). 

Combining (36), (37), (40) and (42), we find 


J (diam Ky = Metla < C (diam KY Iele) 
k=0 


Yg € H*(K) (43) 


where the positive constant C depends only on s and 
an upper bound of the parameter y(K) (the aspect ratio 
of K), provided that (i) the element nodal interpolation 
operator is well-defined on H‘(K), (ii) the triangular ele- 
ment (K, Pg, Ng) is affine-interpolation-equivalent to a 
reference element (S, Pg, Ng) on the standard simplex, 
Gii) P D Pa» and (iv) m is the largest integer < s. 

For convex quadrilateral elements, we can similarly 
obtain a concrete description of the dependence of ||IT¢ll,.,5 
on the shape of K by assuming that there is a reference 
element (S, Pg, Ng) defined on the biunit square S with 
vertices (+1, +1) and a bilinear homeomorphism Ê from S 
onto K with the following properties: Ò Pe if and only if 
vo ® € Py and (0na) o = Ts o Ê) for all sufficiently 
smooth functions f defined on E. Note that because of (36) 


this is equivalent to the existence of a bilinear homeomor- 
phism from S onto K such that 


v € Pg => vo È E Ps and (Ilgt)o® = TIM; o $) 

44) 
for all sufficiently smooth functions ¢ defined on K. The 
estimate (42) holds again by the chain rule, where the 
positive constant C depends only on the Jacobian matrix 
of & and thus depends € only on npper bounds for the 
parameters yE ) and o(K `) (cf. (30)), which are identical 
with y(K) and o(K). We conclude that the estimate (43) 
also holds for convex quadrilateral elements where the 
positive constant C depends on upper bounds of y(K) and 
o(K) (equivalently an upper bound of the aspect ratio of 
K) provided condition (ii) is replaced by (44). For example, 
the estimate (43) is valid for the quadrilateral Q,, element 
in Example 8. 


Remark 7. The general estimate (40) can be refined 
to yield anisotropic error estimates for certain reference 
elements. For example, in two dimensions, the following 
estimates (Apel and Dobrowolski, 1992; Apel, 1999) hold 
for the P, Lagrange elements on the reference simplex 
S and the Q, tensor product elements on the reference 
square S: 


ə 
EA — Mst) 


L8) 
32 32i 
<c( a a ) (as) 
8x1 9%; I) 29%) lrs 


for j = 1,2 and for all ¢ € H? (S). We refer the readers to 
Chapter 3, this Volume, for more details. 


Remark 8. The analysis of the quadrilateral serendipity 
elements is more subtle. A detailed discussion can be found 
in Arnold, Boffi and Falk (2002). 


Remark 9. The estimate (43) can be generalized naturally 
to 3-D tetrahedral P, elements and hexahedral Q,, elements. 


Remark 10. Letn be a nonnegative integer and n < s < 
n + 1, The estimate 


vg e H*(Q) 
(46) 
for general Q follows from generalized Poincaré—Friedrichs 
inequalities (Nečas, 1967). In the case where Q is convex, 
the constant Cy, depends only on s and the dimension 
of Q, but not on the shape of Q, as indicated by the 
estimate (39). For nonconvex domains, the constant Cos 


7 <C, s 
pan fs IS — Plazo £ Castlea 


82 Finite Element Methods 


does depend on the shape of & (Dupont and Scott, 1980, 
Verfiirth, 1999). 

Let F be a bounded linear functional on H*(2) with 
norm || F || such that F(p) = 0 for all p € P,,(&). It follows 
from (46) that 


IF@IS oe IFG- PSF inf II — Plas 


€P, (82) 
Z (Co FID Sl asey ED 


for all ¢ € H' (2). The estimate (47), known as the Bram- 
ble—Hilbert lemma (Bramble and Hilbert, 1970), is useful 
for deriving various error estimates. 


3.4 Some discrete estimates 


The finite element spaces in Section 3.2 are designed to 
be subspaces of Sobolev spaces so that they can serve 
as the trial/test spaces for Ritz—Galerkin methods. On the 
other hand, since finite element spaces are constructed by 
piecing together finite-dimensional function spaces, there 
are discrete estimates valid on the finite element spaces but 
not the Sobolev spaces. 

Let (K, Pg, Ng) be a finite element such that Py C 
H*(K) for a nonnegative integer k. Since any seminorm 
on a finite-dimensional space is continuous with respect to 
anorm, we have, by scaling, the following inverse estimate: 


lvla) < CGiam K) lvl gece) VUE Py, OSE SK 

(48) 
where the positive constant C depends on the domain K 
(the image of K under the scaling map H defined by (35)) 
and the space Py. 

For finite elements whose shape functions can be pulled 
back to a fixed finite-dimensional function space on a 
reference element, the constant C depends only on the shape 
regularity of the element' domain K and global versions 
of (48) can be easily derived. For example, for a quasi- 
uniform family {J;:i e I} of simplicial or quadrilateral 
triangulations of a polygonal domain Q, we have 

lula < Ch'ulla «Vue V; and tel (49) 
where V; C H 1(Q) is either the P, triangular finite element 
space or the Q,, quadrilateral finite element space associated 
with J;. Note that V, C H*(Q) for any s < 3/2 and a bit 
more work shows that the following inverse estimate (Ben 
Belgacem and Brenner, 2001) also holds: 


VueVv, iel and 
l<s <3/2 (50) 


je 
lvla < Ch; “lelero 


where the positive constant C, can be uniformly bounded 
for s in a compact subset of [1, 3/2). 

Tt is well-known that in two dimensions the Sobolev 
space H'(Q) is not a subspace of C(Q). However, the 
P, triangular finite element space and the Q, quadrilateral 
finite element space do belong to C(Q) and it is possible 
to bound the L norm of the finite element function by 
its H norm. Indeed, it follows from Fourier transform 
and extension theorems (Adams, 1995, Wloka, 1987) that, 
for e > 0, 

liza SCP Nara, Yve HQ) SD 
By taking «€ = (1 + |Inh,|)~! in (51) and applying (50), we 
arrive at the following discrete Sobolev inequality: 


Wile < CO +m) lle, Voe V; and 
iel (52) 


where the positive constant C is independent of i € J. 

The discrete Sobolev inequality and the Poincaré—Fried- 
richs inequality (8) imply immediately the following dis- 
crete Poincaré inequality: 


Iolra < lo = Slira + llr £ 2M = lre 
< C+ |Inh,) = lmno 
< CU + lnh Dola 63 


for all v e V, that vanishes at a given point in Q and with 
mean 0 = fo v dx/|Q]. 


Remark 11. The discrete Sobolev inequality can also be 
established directly using calculus and inverse estimates 
(Bramble, Pasciak and Schatz, 1986; Brenner and Scott, 
2002), and both (52) and (53) are sharp (Brenner and Sung, 
2000). 


4 A PRIORI ERROR ESTIMATES FOR 
FINITE ELEMENT METHODS 


Let T be a triangulation of Q and a finite element 
(D, Pz, N5) be associated with each subdomain D € Tso 
that the resulting finite element space FE, (cf. (32)) is a 
subspace of C”—!() c H™(Q). By imposing appropriate 
boundary conditions, we can obtain a subspace V; of FE, 
such that VC V, the subspace of H”(Q), where the weak 
problem (1) is formulated. The corresponding finite element 
method for (1) is: 

Find uz € Vy such that 


a(uyz, v) = F(v) Vue Vy (54) 


In this section, we consider a priori estimates for the 
discretization error u — uy. We will discuss the second- 
order and fourth-order cases separately. We use the letter C 
to denote a generic positive constant that can take different 
values at different appearances. 

Let us also point out that the asymptotic error analysis 
carried out in this section is not sufficient for parameter- 
dependent problems (e.g. thin structures and nearly incom- 
pressible elasticity) that can experience locking (BabuSka 
and Suri, 1992). We refer the readers to other chapters 
in this encyclopedia that are devoted to such problems 
for the discussion of the techniqnes that can overcome 
locking. 


4.1 Second-order problems ; 
We will devote most of our discussion to the case where 
Q C R? and only comment briefly on the 3-D case. For pre- 
ciseness, we also assume the right-hand side of the elliptic 
boundary value problem to be square integrable. We first 
consider the case where V C H! (Q) is defined by homo- 
geneous Dirichlet boundary conditions (cf. Section 2.1) on 
T C @Q. Such problems can be discretized by triangular P, 
elements (Example 4) or quadrilateral Q, elements (Exam- 
ple 8). 

Let T be a triangulation of 2 by triangles (convex quadri- 
laterals) and each triangle (quadrilateral) in 7 be equipped 
with the P, (n > 1) Lagrange element (Q,, quadrilateral 
element). The resulting finite element space FE; is a sub- 
space of C°(Q) c H}(Q). We assume that T is the union 
of the edges of the triangles (quadrilaterals) in 7 and take 
Vr= V N F Ez, the subspace defined by the homogeneous 
Dirichlet boundary condition on T. 

We know from the discussion in Section 2.3 that u € 
H'++)(D) for each D € 7, where the number a(D) € 
(0, 1] and a(D) = 1 for D away from the singular points. 
Hence, the element nodal interpolation operator Hz is well- 
defined on u for all D € 7. We can therefore piece together 
a global nodal interpolant u € V;by the formula 


(17u)l5 = Mp(ulp) (55) 


From the discussion in Section 3.3, we know that (43) is 
valid for both the triangular P, element and the quadrilat- 
eral Q„ element. We deduce from (43) and (55) that 


lu — PUM, < C) (diam DY" uho) 6) 
DET 


where C depends only on the maximum of the aspect ratios 
of the element domains in 7. Combining (22) and (56) we 
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have the a priori discretization error estimate 


le — urlio < CCD iam DPD uB na) 
DeT 
(57) 

where C depends only on the constants in (2) and (3) and 
the maximum of the aspect ratios of the element domains 
in T 

Hence, if {7;:i € I} is a regular family of triangulations, 
and the solution u of (1) belongs to the Sobolev space 
H}+*(Q) for some a € (0, 1], then we can deduce from 
(57) that 


lu — uz ziggy < Chi lul pray (58) 


where h; = maxpez diam D is the mesh size of J; and C is 
independent of i e J. Note that the solution w of (23) with 
i replaced by u-also belongs to H!+*(Q) and satisfies the 
elliptic regularity estimate 


lwla < Clu — Uzi) 
Therefore, we have 
m lw — vilao < llw - nF wiley < Ch? lwl anato) 
< Ch} lu — urlo (59) 


The abstract estimate (24) with & replaced by u; and (59) 
yield the following L, estimate: 


lu — uz lz < Chul m (60) 


where C is also independent of i € I. 


Remark 12. In the case where a = 1 (for example, when 
T = 6Q in Example 1 and Q is convex), the estimate (58) is 
optimal and it is appropriate to use a quasi-uniform family 
of triangulations. In the case where a(D) < 1 for D next 
to singular points, the estimate (57) allows the possibility 
of improvement by graded meshes (cf. Section 6). 


Remark 13. In the derivations of (58) and (60) above 
for the triangular P, elements, we have used the minimum 
angle condition (cf. Remark 5). In view of the anisotropic 
estimates (45), these estimates also hold for triangular P, 
elements under the maximum angle condition (BabuSka and 
Aziz, 1976; Jamet, 1976; Ženišek, 1995; Apel, 1999): there 
exists 0, < x such that all the angles in the family of 
triangulations are <6,. The estimates (58) and (60) are 
also valid for Q,, elements on parallelograms satisfying the 
maximum angle condition. They can also be established for 
certain thin quadrilateral elements (Apel, 1999). 
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The 2-D results above also hold for 3-D tetrahedral P, 
elements and 3-D hexagonal Q, elements if the solution 
u of (1) belongs to H'+"(Q) where 1/2 < & <1, since 
the nodal interpolation operator are then well-defined by 
the Sobolev embedding theorem. This is the case, for 
example, if T = dQ in Example 1. However, new inter- 
polation operators that require less regularity are needed if 
0 <a < 3/2. Below, we construct an interpolation opera- 
tor 114: H! (Q) > Vz using the local averaging technique 
of Scott and Zhang (1990). 

For simplicity, we take V7 to be a tetrahedral P, finite 
element space. Therefore, we only need to specify the 
value of rig at the vertices of 7 for a given function 
çe H'(Q). Let p be a vertex. We choose a face (or 
edge in 2-D) F of a subdomain in T such that p € Ê 
The choice of F is of course not unique. But we always 
choose F C 3X if p € 4&2 so that the resulting interpolant 
will satisfy the appropriate Dirichlet boundary condition. 
Let we, C P,(F) be biorthogonal to the nodal basis 
fo; C P(A with respect to the L,(F) inner product, 
In other words b, equals 1 at the jth vertex of F and 
vanishes at the other vertices, and 


[we ds = 8,; (61) 


Suppose p corresponds to the jth vertex of F. We then 
define 


Tp) = is wibds (62) 


where the integral is well-defined because of the trace 
theorem. 

It is clear in view of (61) and (62) that T4v =v for 
all v € FE; and Ag =0 on T if ç =0 on I. Note also 
that TIŻ is not a local operator, i.e., (1145)|,, is in general 
determined by t| s py» Where S(D) is the polyhedral domain 
formed by the subdomains in 7 sharing (at least) a vertex 
with D (cf. Figure 9 for a 2-D example). It follows that 
the interpolation error estimate for I takes the following 
form: 


I = FZCIZ py + (diam D)* |g — MSti 
= C (diam C ig [inasos(D) 


(63) 
where C depends on the shape regularity of J, provided 
that ¢ e H+S) (§(D)) for some a(S(D)) € (0, 1]. The 
estimates (58) and (60) for tetrahedral P, elements can 
be derived for general a € (0, 1] and regular triangulations 
using the estimate (63). 


Figure 9. A two-dimensional example of S(D). 


Remark 14, The interpolation operator I14 can be defined 
for general finite elements (Scott and Zhang, 1990; Girault 
and Scott, 2002) and anisotropic estimates can be obtained 
for TŻ for certain triangulations (Apel, 1999). There 
also exist interpolation operators for less regular functions 
(Clément, 1975; Bernardi and Girault, 1998). 


Next, we consider the case where V is a closed subspace 
of H'(Q) with finite codimension n < 00, as in the case 
of the Poisson problem with pure homogeneous Neumann 
boundary condition (where n = 1) or the elasticity problem 
with pure homogeneous traction boundary condition (where 
n=1 when d = 1, and n = 3(d — 1) when d =2 or 3). 
The key assumption here is that there exists a bounded 
linear projection Q from H'(&) onto an n dimensional 
subspace of FE; such that ¢ € H!(Q) belongs to V if and 
only if Qt = 0. We can then define an interpolation operator 
I; from appropriate Sobolev spaces onto Vy by 


ñ= (1 - QO), 


where IT, is either the nodal interpolation operator TIY 
or the Scott-Zhang averaging interpolation operator I14 
introduced earlier. Observe that, since the weak solution u 
belongs to V, 


u — ÎÑipu = u — Qu — (I — Q)Mqu = (I — Q)(u — Mu) 


and the interpolation error of fi, can be estimated in terms 
of the norm of Q: H! (Q) > H! (Q) and the interpolation 
error of Iy. Therefore, the a priori discretization error esti- 
mates for Dirichlet or Dirichlet/Neumann boundary value 
problems also hold for this second type of (pure Neumann) 
boundary value problems. 

For the Poisson problem with homogeneous Neumann 
boundary condition, we can take 


1 
=— | tdx 
a= fe 


the mean of ¢ over Q. For the elasticity problem with pure 
homogeneous traction boundary condition, the operator Q 
from [H!(Q)]? onto the space of infinitesimal rigid motions 


i 
Í 
i 
i 
[i 
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is defined by 


f ori= fta and 


[vx orar= f Yxtd Ve e [H] 
Q Q 


In both cases, the norm of Q is bounded by a constant Co. 


Remark 15. In the case where f € H*(Q) for k > 0, the 
solution u belongs to H?+*(@) away from the geometric 
or boundary data singularities and, in particular, away 
from ðN. Therefore, it is advantageous to use higher- 
order elements in certain parts of &, or even globally 
(with curved elements near 082) if singularities are not 
present. In the case where f € L,({2), the error estimate 
(57) indicates that the order of the discretization error for 
the triangular P, element or the quadrilateral Q„ element 
is independent of n > 1. However, the convergence of the 
finite element solutions to a particular solution as h | 0 
can be improved by using higher-order elements because 
of the existence of nonuniform error estimates (Babuška 
and Kellogg, 1975). 


4.2 Fourth-order problems 


We restrict the discussion of fourth-order problems to the 
two-dimensional plate bending problem of Example 3. 

Let T be a triangulation of Q by triangles and each 
triangle in 7 be equipped with the Hsieh-Clough—Tocher 
macro element (cf. Example 6). The finite element space 
FE, defined by (32) is a subspace of C1(Q) C H?(Q). We 
take V;to be V N FEz, where V = H?(Q) for the clamped 
plate and V = Hå (2) N H?(Q) for the simply supported 
plate. 

The solution u of the plate-bending problem belongs 
to H?+)(D) for each D e T, where a(D) e (0, 2] and 
a(D) =2 for D away from the corners of Q. The ele- 
mental nodal interpolation operator Fy is well-defined 
on u for all De TZ We can therefore define a global 
nodal interpolation operator II by the formula (55). 
Since the Hsieh-Clough-Tocher macro element is affine- 
interpolation-equivalent to the reference element on the 
standard simplex, we deduce from (55) and (43) that 


lu — Tula) < C > (diam DY lulio) (64) 
DeT 


where C depends only on the maximum of the aspect ratios 
of the triangles in 7 (or equivalently the minimum angle of 


T). From (22) and (64), we have 


4/2 
lu — urlimo < C (Scam Dion) 


DeT 
(65) 
where C depends only on the constants in (2) and (3) and 
the minimum angle of 7. 

Hence, if {7;:i € J} is a regular family of triangulations, 
and the solution u of the plate bending problem belongs 
to the Sobolev space H?+*(2) for some a € (0, 2], we can 
deduce from (65) that 


lu — uz llao < Chul azao (66) 


where h; = max pez diam D is the mesh size of 7; and C 
is independent of i € J. Since the solution w of (23) also 
belongs to H2+*(Q), the abstract estimate (24) combined 
with an error estimate for w in the H?-norm analogous to 
(66) yields the following L, estimate: 


lu — Uz lle @y S Ch?" |u| ao) (67) 


where C is also independent of i € 7. 


Remark 16. The analysis of general triangnlar and quad- 
rilateral C! macro elements can be found in Douglas et al. 
(1979). 


The plate-bending problem can also be distretized by 
the Argyris element (cf. Example 5). If a(D) > 1 for all 
D € D, then the nodal interpolation operator ILY is well- 
defined for the Argyris finite element space. If a(D) <1 
for some D e T, then the nodal interpolation operator IY 
must be replaced by an interpolation operator constructed 
by the technique of local averaging. In either case, the 
estimates (65)-(67) remain valid for the Argyris finite 
element solution. 


5 A POSTERIORI ERROR ESTIMATES 
AND ANALYSIS 


In this section, we review explicit and implicit estimators as 
well as averaging and multilevel estimators for a posteriori 
finite element error control. 

Throughout this section, we adopt the notation of Sec- 
tions 2.1 and 2.2 and recali that u denotes the (unknown) 
exact solution of (1) while # e V denotes the discrete and 
given solution of (19). It is the aim of Section 5.1-5.6 
to estimate the error e := u—i € V in the energy norm 
f+ lla = (aC, -))” in terms of computable quantities while 
Section 5.7 concerns other error norms or goal functionals. 
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Throughout this section, we assume 0 < |{e||, to exclude 
the exceptional situation u = i. 


5.1 Aims and concepts in a posteriori finite 
element error control 


The following five sections introduce the notation, the 
concepts of efficiency and reliability, the definitions of 
residual and error, a posteriori error control and adaptive 
algorithms, and comment on some relevant literature. 


5.1.1 Error estimators, efficiency, reliability, 
asymptotic exactness 


Regarded as an approximation to the (unknown) error norm 
jel], a (Computable) quantity n is called a posteriori error 
estimator, or estimator for brevity, if it is a function of the 
known domain Q and its boundary I’, the quantities of the 
right-hand side F, cf. (6) and (13), as well as of the (given) 
discrete solution #, or the underlying triangulation. 

An estimator y is called reliable if 


lela < Crea N+ bot, (68) 
An estimator y is called efficient if 
n S Ceg llell, + h.o.tes (69) 


An estimator is called asymptotically exact if it is reliable 
and efficient in the sense of (68)—(69) with Cea = Cy. 

Here, Cpe and C, are multiplicative constants that do 
not depend on the mesh size of an underlying finite element 
mesh 7 for the computation of @ and h.o.t. denotes higher- 
order terms. The latter are generically much smaller than 
n or |jell,, but usually, this depends on the (unknown) 
smoothness of the exact solution or the (known) smoothness 
of the given data, The readers are warned that, in general, 
h.o.t. may not be neglected; in case of high oscillations they 
may even dominate (68) or (69). 


5.1.2 Error and residual 


Abstract examples for estimators are (26) and (27), which 
involve dual norms of the residual (25). Notice carefully 
that R := F — a(i, -) is a bounded linear functional in V, 
written R e V*, and hence the dual norm 


Rin = es (v) ale, v) 
ys atts lola vevo lvla 


= lell, < 00 


(70) 
The second equality immediately follows from (25). A 
Cauchy inequality in (70) with respect to the scalar product 


a results in | Rily- < {lell while v = e in (70) yields finally 
the equality |Rlly- = lela. 

That is, the error (estimation) in the energy norm is equiv- 
alent to the (computation of the) dual norm of the given 
residual. Furthermore, it is even of comparable computa- 
tional effort to compute an optimal v = e in (70) or to 
compute e. The proof of (70) yields even a stability esti- 
mate: The relative error of R(v) as an approximation to 
lella equals 


2 


(lela - R@)) _ 1 p- e 
lela 2 lella 


forall veV with lvl, =1 (71) 


a 


In fact, given any v € V with ||v||, = 1, the identity (71) 
follows from 


1-a (i =z ẹ )-a( v) 
lela’ J 2 Vels’ lella lella’ 


2 


e 


Tela 


TE E ae 
2 eine IT 


a 


The error estimate (71) implies that the maximizing v in 
(70) (ie. ve V with maximal R(v) subject to lull, < 
1) is unique and equals e/tlel|,. As a consequence, the 
computation of the maximizing v in (70) is equivalent to 
and indeed equally expensive as the computation of the 
unknown e/|le||, and so (since % is known) of the exact 
solution u. Therefore, a posteriori error analysis aims to 
compute lower and upper bounds of || Rly. rather than its 
exact value. 


5.1.3 Error estimators and error control 


For an idealized termination procedure, one is given a 
tolerance Tol > 0 and interested in a stopping criterion (of 
successively adapted mesh refinements) 


llell, < Tol 


Since the error |le||, is unknown, it is replaced by its upper 
bound (68) and then leads to 


Ce + ho-t. < Tol (12) 


For a verification of (72), in practice, one requires not only 
n but also Cpe and h.O.t. The later quantity cannot be 
dropped; it is not sufficient to know that h.o.t.,., is (possibly) 
negligible for sufficient small mesh-sizes. 

Section 5.6 presents numerical examples and further dis- 
cussions of this aspect. 


5.1.4 Adaptive mesh-refining algorithms 


Error estimators are used in adaptive mesh-refining algo- 
rithms to motivate a refinement rule, which determines 
whether an element or edge and so on shall be refined or 
coarsened. This will be discussed in Section 6 
below. 

At this stage two remarks are in order. First, one should 
be precise in the language and distinguish between error 
estimators, which are usually global and fully involve 
constants and higher-order terms and (local) refinement 
indicators used in refinement rules. Second, constants and 
higher-order terms might be seen as less important and are 
often omitted in the usage as refinement indicators for the 
step MARKING in Section 6,2. 


5.1.5 Literature 


Amongst the most influential pioneering publications on 
a posteriori error control are Babuška and Rheinboldt 
(1978), Ladeveze and Leguillon (1983}, Bank and Weiser 
(1985), Babuška and Miller (1987), Eriksson and Johnson 
(1991), followed by many others. The readers may find it 
rewarding to study the survey articles of Eriksson er al. 
(1995), Becker and Rannacher (2001) and the books of 
Verfiirth (1996), Ainsworth and Oden (2000), Babuška and 
Strouboulis (2001), Bangerth and Rannacher (2003) for a 
first insight and further references. 


5.2 Explicit residual-based error estimators 


The most frequently considered and possibly easiest class of 
error estimators consists of local norms of explicitly given 
volume and jump residuals multiplied by mesh-depending 
weights. 

To derive them for a general class of abstract problems 
from Section 2.1, let u € V be an exact solution of the 
problem (1) and let ñ € V be its Galerkin approximation 
from (19) with residual R(v) from (25). Moreover, as in 
Example 1 or 2, it is supposed throughout this chapter that 
the strong form of the equilibration associated with the 
weak form (19) is of the form 


—divp = f for some flux or stress p € L?(Q; R”*") 


The discrete analog p is piecewise smooth but, in general, 
discontinuous; at several places below, it is a T piecewise 
constant m x n matrix as it is proportional to the gradi- 
ent of some (piecewise) P) FE function %. The description 
of the residuals is based on the weak form of f + div 
p=0. 
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5.2.1 Residual representation formula 


It is the aim of this section to recast the residual in the form 


RO) =D f rvar- | reves (73) 


TeT Ece 


of a sum of integrals over all element domains T € 7 
plus a sum of integrals over all edges or faces E € E and 
to identify the explicit volume residual ry and the jump 
residual ry. 

The boundary ôT of each finite element domain T € Tis 
a union of edges or faces, which form the set E(T), written 
ƏT = UE(T). Each edge or face E e£ in the set of all 
possible edges or faces € = U{E(T): T € T} is associated 
with a unit normal vector vg, which is unique up to an 
orientation +v,, which is globally fixed. By convention, 
the unit normal v on the domain © or on an element T 
points outwards. 

For the ease of exploration, suppose that the underlying 
boundary value problem allows the bilinear form a(i, v} to 
equal the sum over all fp Pj D; vg dx with given fluxes or 
stresses p,,. Moreover, Neumann data are excluded from 
the description in this section and hence only interior edges 
contribute with a jump residual. An integration by parts on 
T with outer unit normal vp yields 


Pj, Djv a= f Bip Ue jas — f vy Dy ax 
[ind ape E tad a jj 


which, with the divergence operator div and proper evalu- 
ation of pv, reads 


ai, +> f v-divpax = | (pv)-vas 


TET TeT 


Each boundary ðT is rewritten as a sum of edges or faces. 
Each such edge or face E belongs either to the boundary 
əN, written E € Ezg, or is an interior edge, written E € Eg. 
For E € Ezg there exists exactly one element T with E € 
&(T) and one defines T_ = T, T_ = E C 32, mp = int(T) 
and vg := vr = Vg. Any E & Eg is the intersection of 
exactly two elements, which we name T} and T_ and which 
essentially determine the patch œp :=int(T, UT_) of E. 
This description of T, is unique up to the order that is fixed 
in the sequel by the convention that v, = Vy, is exterior to 
T, Then, 


E | vds = > fP vas 


TeT Ee&€g 


where [pvp] := (Piz, — Plp_)vg for E = ôT NOT_ E Eg 
and [Pvg] := 0 for E € E(T) N Ezg. Altogether, one obtains 
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the error residual error representation formula (73) with the 


volume residuals ry := f +divp inTeT 


jump residuals rg := [Pvg] along E € Eg 


5.2.2 Weak approximation operators 


In terms of the residual R, the orthogonality condition (20) 
is rewritten as R(O) = 0 for all de 7. Hence, given any 
v € V with norm |jv||a = 1, there holds R(v) = R(v — 3). 

Explicit error estimators rely on the design of 0 := 
T14(v) as a function of v, FIŻ is called approximation 
operator as in (61)—-(63) and discussed further in Section 4. 
See also (Carstensen, 1999; Carstensen and Funken, 2000; 
Nochetto and Wahlbin, 2002). For the understanding of this 
section, it suffices to know that there are several choices of 
ŭe V that satisfy first-order approximation and stability 
properties in the sense of 


DPE- DIB, + YD Whe’? @- Dike 
TET Esn 
+w- ibro Cna (74) 

Here, hy and hg denotes the diameter of an element 
T € Tand an edge E € £, respectively. The multiplicative 
constant C is independent of the mesh-sizes hy or Apg, but 
depends on the shape of the element domains through their 
minimal angle condition (for simplices) or aspect ratio (for 
tensor product elements). 


5.2.3 Reliability 


Given the explicit volume and jump residuals r, and r; in 
(73), one defines the explicit residual-based estimator 1 RR 


thn = $ hi elt + 3) he rele 05 
TET Ecg 


which is reliable, that is 
lela £ Chg pR (76) 


The proof of (76) follows from (73)-(75) and Cauchy 
inequalities: 


RW) = RO- 8) = fry Wd ax 


TET 
-5 [re @- 06 
E€Eg E 
< X hy irrino G7 lv — lzm) 
TeT 


ep Sp cn? lrg lep hz” llv — Dll) 


Ecêg 
1/2 1/2 
2 2 -2 ~ 12 
= (= hy ttt) (= hy llv — iin) 
TET TET 
1/2 1/2 
2 -1 a2 
+| So Ae lirellicey} | Do rr le ilw 
E&ég EEEn 


< C nr, r Wima) 


For first-order finite element methods in the situation of 
Example 1 or 2, the volume term ry = f can be substituted 
by the higher-order term of oscillations, that is 


llel? <c (=+ Xhe reo) (77) 
E&&g 


For each node z e M with nodal basis function g, and 
patch œ, := {x € Q : g,(z) #0} of diameter h, and the 
source term f € L,(Q)” with integral mean f, := |w,|~! 
Jo, Fœ) dx € R”, the oscillations of f are defined by 


1/2 
ose( f) = (x nef - its 
zEN 


Notice for f € H'(Q2)" and the mesh size hye P)(T) 
there holds 


osc(f) < C [IAF DF lta 


and so osc( f) is of qnadratic and hence of higher-order. We 
refer to Carstensen and Verfiirth (1999), Nochetto (1993), 
Becker and Rannacher (1996), and Rodriguez (1994b) for 
further details on and proofs of (77). 


5.2.4 Efficiency 


Following a technique with inverse estimates due to 
Verfiirth (1996), this section investigates the proof of effi- 
ciency of yz,z in a local form, namely, 


hy lrrllz cry EG (lell ar +osc(f, T)) (78) 
hy IIrzllacey £ C (lel mwe + osel, g)) (79) 


where f denotes an elementwise polynomial (best-) approx- 
imation of f and 


ose(f, T) := hy If — filly 
and 


ose(f, Oe) := he If — Filtras) 


§ 
f 
f 
t 
F 


The main tools in the proof of (79) and (78) are bubble 
functions bp and by based on an edge or face E € E and 
an element T € Twith nodes M(E) and M(T), respectively. 
Given a nodal basis (ẹ,:z € M) of a first-order finite 
element method with respect to 7 define, for any T € T 
and E € Eg, the element- and edge-bubble functions 


br = gi 4, € Hy(T) and bg = I] 9, € Hd (wz) 
z€M(T) zeM(E) 

(80) 
bs and by are nonnegative and continuous piecewise 
polynomials <1 with support suppb, = ©z = T} UT_ 
(for T} € T with E = T, N T_) and suppb; =T. 

Utilizing the bubble functions (80), the proof of (78)- 
(79) essentially consists in the design of test functions 
wy € HUT), T € T, and wz € Hi (wp), E E Eg, with the 
properties 


|Wrlacry <í Chr irriz) 
and 


lwelmgor S one Welz ce (81) 
Ae Welz) < Cy Rwy) + Cp ose f, T)? : (82) 
he Wrellincey < Cy R(we) + Cosel, Og)” (83) 


In fact, (81)—(83), the definition of the residual R = a(e, -), 
and Cauchy inequalities with respect to the scalar product 
a prove (78)-(79). 

To construct the test function wp, T € T, recall div p+ 
f=0 and ry = f+divp and set fp := f +divp for 
some polynomial f on T such that řņ is a best approxima- 
tion of ry in some finite-dimensional (polynomial) space 
with respect to L,(T). Since 


hy Welln £ hy lrrlzxr S Ar lFrllzsery 
the If - fila 


it remains to bound 7,, which belongs to a finite-dimen- 
sional space and hence satisfies an inverse inequality 


hy WFrlliggy S Chr oy Frinn 
This motivates the estimation of 
Nb Pel ers = [ors “Gp —rr)dx + Í, brřr ` ry dx 
< lbk Frl E - Aller 
+ f ort, : div (5 — p) dx 
The combination of the preceding estimates results in 
1B lity Iyer <6 Í (hžbrr) - div (5 — p) dx 


+ Cy ose(f, fy? 
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An integration by parts concludes the proof of (82) for 
wr = he br Fy (84) 


the proof of (81) for this wy is immediate. 

Given an interior edge E = T, N T_ € Eg with its neigh- 
boring elements 7, and T_, simultaneously addressed as 
T4 € T, extend the edge residual rp from the edge E to its 
patch wg = int(T, U T_) such that 


1/2 
lDzerellr wey + Aglbereluogy £ Che Well 


< Ch Wee relly (85) 


(with an inverse inequality at the end). The choice of the 
two real constants 


2 
Í hgbg fr, Tg dx 
Tx 


Í. wr Fp, dx 


+ 


a= 


in the definition 
Wg =, Wy, +O_we —hgbgrg (86) 


yields fp we-Fpdx=0. Since fr wy, - Fp, dx = 
hh, lore, Iz. «rey one eventually deduces [a| lwr, laira 
< Ch? |Irgllz,ce and then concludes (81). An integration 
by parts shows 


T= 1/2 
Chp Irel?) < Aellb g relia 
=- f wsreds= | we LP P: velds 
E E 


= P-P: Dwg ax | 


OE a, 


wg -div;(p — P) dx 
= Rw- f wg- (f + divzp) dx 
= R(w,)- f we Y-a 


(with A Wg Fy, dx =0 in the last step). A Friedrichs 
inequality welir) S Che lwem and (81) then 
conclude the proof of (83). 


5.3 Implicit error estimators 


Implicit error estimators are based on a local norm of a 
solution of a localized problem of a similar type with 
the residual terms on the right-hand side. This section 
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introduces two different versions based on a partition of 
unity and based on an equilibration technique. 


5.3.1 Localization by partition of unity 


Given a nodal basis (9,:z €M) of a first-order finite 
element method with respect to 7, there holds the partition 


of unity property 


Jesl in Q 


zeN 


Given the residual R = F — a(ŭ, -) c V*, we observe that 
R (v) := R(9,v) defines a bounded linear functional R, on 
a localized space called V, and specified below. 

The bilinear form a is an integral over œ on some inte- 
grand, The latter may be weighted with ¢, to define some 
(localized) bilinear form a, : V, x V, —> R. Supposing that 
a, is V,-elliptic one defines the norm I+ ila, on V; and 
considers 

EXO) 


vevAlo lvla, 


(87) 


The dual norm is as in (70)-(71)} and hence equivalent to 
the computation of the norm |le,||,, of a local solution 

e, €V, with a,(e,,.)=R,eV; (88) 
(The proof of lje,lla, = N, follows the arguments of Sec- 
tion 5.1.2 and hence is omitted.) 


Example 10. Adopt notation from Example 1 and let 
(p, :z EN) be the first-order finite element nodal basis 
functions. Then define R, and a, by 


R,(v) =i o, fdr — f Vi-Vig,v)dx Woe V 
JQ Q 
@, (U1, Vg) = [evn Vu ,dx Yuu EV 


Let V, denote the completion of V under the norm given 
by the scalar product a, when ¢, # 0 on I’ or otherwise its 
qnotient space with R, i.e. 


fv € Hi,(@,) : a (v, v) < 00, OU = 0 on TN 4w,} 
oe if g, #0onT 
{v € Hi,(@,) : a,(v, v) < 00, fo 9v dx = 0} 
if g,=0oT 


Y 


Notice that R,(1) = 0 for a free node z such that (88) has 
a unique solution and hence n, < oo. 


In practical applications, the solution e, of (88) has to 
be approximated by some finite element approximation €, 


on a discrete space v, based on a finer mesh or of higher 
order. (Arguing as in the stability estimate (71), leads to an 
error estimate for an approximation |ë, || ae of n,-) 
Suppose that n, is known exactly (or computed with high 
and controlled accuracy) and that the bilinear form a is 
localized through the partition of unity such that (e.g. in 
Example 10) 
até, v) = Yau, v) Vu,veV 


zeNV 


(89) 


Then the implicit error estimator nç is reliable with C, = 1 
and h.o.t.,.4 = 0, 


1/2 
lela < nz =| X n? 
zen 


The proof of (90) follows from the definition of R,, n,, and 
e, and Cauchy inequalities: 


(90) 


lel? = RE) = DR.) < on lela, 
zEeN zen 


1/2 1/2 
< (x r) (z wg.) = nz liella 
zeN zeN 


Notice that |ë; lla, == ñ; <n, for any approximated local 
solution 
ze, with a@,,)=R,€ Vs (91) 
and all of them are efficient estimators. The proof of 
efficiency is based on a weighted Poincaré or Friedrichs 
inequality which reads 
llevlla SC lull, Yv E€ V, (92) 
In fact, in Example 1, 2, and 3, one obtains efficiency in a 
more local form than indicated in 
nz < C lella 


with ho.teg = 0 (93) 


(This follows immediately from (92): 
R,(v) = R(,v) = a, v) < lela levla 
< Cllell, lella,) 


In the situation of Example 10, the estimator n; dates back 
to Babuška and Miller (1987); the nse of weights was 
established in Carstensen and Funken (1999/00). A reliable 
computable estimator fz, is introduced in Morin, Nochetto 


and Siebert (2003a) based on a proper finite-dimensional 
space V, of some piecewise quadratic polynomials on «,. 


5.3.2 Equilibration estimators 


The nonoverlapping domain decomposition schemes emp- 
loy artificial unknowns gy € L,(8T)” for each T € T at 
the interfaces, which allow a representation of the form 


RU) =J Rp) where 
TeT 


Rr) = | f-var- f p: Dva f gr’ vds (94) 
T T aT 


Adopting the notation from Section 5.2.1, the new quanti- 
ties gy satisfy 


8r, + 8r- =0 along E = 3T] N3T_ E Eg 


(where T} and T denote neighboring element domains). 
to guarantee (94). (There are non-displayed modifications 
on any Neumann boundary edge E C 3N.) Moreover, the 
bilinear form a is expanded in an elementwise form 
Wu,vev 


a(u, v) = > a,(u, v) (95) 


TeT 


Under the equilibration condition R;(c) = 0 for all kernel 
functions c (namely, the constant functions for the Laplace 
model problem), the resulting local problem reads 


er E Vy With ay(er,-) = Ry € Ve (96) 
and is equivalent to the computation of 
Rr(v) 
T = llerller (97) 


T gevo llar 


The sum of all local contributions defines the reliable 
equilibration error estimator Nggo, 


lell, < ngo ‘= È DM 


TET 


(98) 


(The immediate proof of (98) is analogous to that of (90) 
and hence is omitted.) 


Example 11. In the situation of Example 10 there holds 
Nr < œ if and only if either T N OT has positive surface 
measure (with Vp = {v € H? (T): v=0 on I'NAT}) or 
otherwise R,(1) =0 (with Vp = {v € H'(T): fpvdx = 
0}). Ladeveze and Leguillon (1983) suggested a certain 
choice of the interface corrections to guarantee this and 
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even higher-order equilibrations are established. Details 
on the implementation are given in Ainsworth and Oden 
(2000); a detailed error analysis with higher-order equili- 
brations and the perturbation by a finite element simulation 
of the local problems with corrections can be found in 
Ainsworth and Oden (2000) and Babuška and Strouboulis 
(2001). 


The error estimator 7 = Ngo is efficient in the sense of 
(69) with higher-order terms h.o.t.(T) on T that depend on 
the given data provided : 


1/2 a 
hilst — Brelligey < C lela, +b-0t.(T) 


forall EEE(T) (99) 
(Recall that €(T) denotes the set of edges or faces of T.) 
This stability property depends on the design of gr; a 
positive example is given in Theorem 6.2 of Ainsworth 
and Oden (2000) for Example 1. Given Inequality (99), 
the efficiency of ny follows with standard arguments, for 
example, an integration by parts, a trace and Poincaré 
or Friedrichs inequality Az"llvllcry +27 lulls sary < 
C lull, for v € Vz: 


Rr = f rp-var+ f (g7 — PY)» vds 
T ar 


sc (teller lza + Ay er — Plinen) Illa, 


followed by (79) and (99). 


5.4 Multilevel error estimators 


While the preceding estimators evaluate or estimate the 
residual of one finite element solution x, multilevel esti- 
mators concern at least two meshes J, and 7, with 
associated discrete spaces Vy C V, C V and two discrete 
solutions uy =ü and u,. The interpretation is that p, is 
computed on a finer mesh (e.g. 7, is a refinement of Ty) 
or that p, is computed with higher polynomial order than 


Py =P. 


5.4.1 Error-reduction property and multilevel error 
estimator 


Let Vy C Vp C V denote two nested finite element spaces 
in V with coarse and fine finite element solution uy = 
i e Vy =V and u, € V, of the discrete problem (19), 
respectively, and with the exact solution u. Let py = 
P, Pa, and p denote the respective fluxes and let || - || 
be a norm associated to the encrgy norm, for example, 
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a norm with ||p — pil = lu — all, and ip — pall = lu — 
ü, la- Then, the multilevel error estimator 


NML = I|Pp — Pull = llt — tyla (100) 


simply is the norm of the difference of the two discrete 
solutions. The interpretation is that the error || p — p,|| of 
the finer discrete solution is systematically smaller than the 
error lel, = l|p — Py}| of the coarser discrete solution in 
the sense of an error-reduction property: For some constant 
o < 1, there holds 


lp — Pall Selle — pyll (101) 


Notice the bound o < 1 for Galerkin errors in the energy 
norm (because of the best-approximation property). The 
point is that ọ < 1 in (101) is bounded away from one. 
Then, the error-reduction property (101) immediately imp- 
lies reliability and efficiency of nyg: 


(—o)llp— Pull S nur = Pr- Pall £< 0+- Pal 
(102) 
(The immediate proof of (102) utilizes (101) and a simple 
triangle inequality.) 

Four remarks on the error-reduction conclude this sec- 
tion: Efficiency of nayz in the sense of (69) is robust 
in ọ — 1, but reliability is not: The reliability constant 
Ca = a - 7! in (68) tends to infinity as 9 approaches 1. 

Higher-order terms are invisible in (102): h.o.t,.. = 0 = 
h.o.t..¢. This is unexpected when compared to all the other 
error estimators and hence indicates that (101) should fail 
to hold for heavily oscillating right-hand sides. 

The error-reduction property (101) is often observed in 
practice for fine meshes and can be monitored during the 
calculation. For coarse meshes and in the preasymptotic 
range, (101) may fail to hold. 

The error-reduction property (101) is often called satura- 
tion assumption in the literature and freqnently has a status 
of an unproven hypothesis. 


5.4.2 Counterexample for error-reduction 


The error-reduction property (101) may fail to hold even 
if f shows no oscillations: Figure 10 displays two triangu- 
lations, J, and its refinement 7,, with one and five free 
Nodes, respectively. If the right-hand side is constant and if 
the problem has homogeneous Dirichlet conditions for the 
Poisson problem 


1+Au=0 inQ:=(0,1)? and u=0 on dQ 


then the corresponding P} finite element solutions coincide: 
uy = u,,. A direct proof is based on the nodal basis function 


SAIS 


Ta Th 


Figure 10. Two triangulations 7y and 7, with equal discrete P; 
finite element solutions uy = up for a Poisson problem with right- 
hand side f = 1. The refinement 7, of Ty is generated by two 
(newest-vertex) bisections per element (initially with the interior 
node in Jj as newest vertex). 


®, of the free node in Vy := FEz, (the first-order finite 
element space with respect to the coarse mesh 7y) and the 
nodal basis functions 2,..., 5 € Vp := S} (T;,) of the new 
free nodes in 7,. Then, 


Py = O, — (+++ + Gs) E Vp = FE, 
satisfies (since fp ®,ds =0 for all edges E in Ty and 
Ja Pa. dx = 0) 

R(®,) =0 
Thus uy is the fimite element solution in W, := span(®,, 


®,} C Vp. Since, by symmetry, the finite element solution 
u, in V, belongs to W,, there holds u y = tp. 


5.4.3 Affirmative example for error-reduction 


Adopt notation from Section 5.2.4 with a coarse discrete 
space Vy = V and consider the fine space V, := Vy © Wp 
for 


W, := span(7, bp : T € T} ® span{rg bg : EE Eg} CV 
(103) 
Then there holds the error-reduction property up to higher- 
order terms 


1/2 
osc(f) := (= nf - itt) 


TET 


namely 


WP — Pal? < alip — pal? + osc( fy (104) 


The constant ọ in (104) is uniformly smaller than one, 
independent of the mesh size, and depends on the shape 
of the elements and the type of ansatz functions through 
constants in (81)—(83). 


| 
| 
i 
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The proof of (104) is based on the test functions wy and 
wp in (84) and (86) of Section 5.2.4 and 


Wh = ow + X we EW, CV 
TeT Ecéy 


Utilizing (81)-(83) one can prove 


Ter EcEg 


lwl? <C (z lfrlan + Do he rako) 


<c (Eee + J, Re) +a) 


TET EcEg 


= C (R(wp) + osc( fy”) 


Since w, belongs to V, and u, is the finite element 
solution with respect to V, there holds 


R(w,) = atu; —unp, wp) < |lu, ~ üg lla walla 


The combination of the preceding inequalities yields the 
key inequality 


2 2 ye 2 2 
NRR = ù hy lFrllz,cr) Æ ` ; hg ireli 
TET Eco 


< C(llu, — ugl? + ose(f)?) 


This, the Galerkin orthogonality |u — u yll? = llu — u, 3 + 
llu, — uyli2, and the reliability of ng show 


lu — uyl? < CC (lu — ugi? — lu — uy lI + ose F)”) 


and so imply (104) with ọ = 1 —- C!CR? <1. 


Example 12. In the Poisson problem with P, finite ele- 
ments and discrete space Vy, let W, consist of the quadratic 
and cubic bubble functions (80). Then (104) holds; cf. also 
Example 14 below. 


Example 13. Other affirmative examples for the Poisson 
problem consist of the P) and P, finite element spaces Vy 
and V, over one regular triangulation Tor of the P, finite 
element spaces with respect to a regular triangulation Ty 
and its red-refinement 7,. The observation that the element- 
bubble functions are in fact redundant is due to Dérfler and 
Nochetto (2002). 


5.4.4 Hierarchical error estimator 


Given a smooth right-hand side f and based on the example 
of the previous section, the multilevel estimator (100) 


is reliable and efficient up to higher-order terms. The 
costly calculation of u,, however, exclusively allows for 
an accurate error control of uy (and no reasonable error 
control for u,). Instead of (100), cheaper versions are 
favored where u, is replaced by some quantity computed 
by a localized problem. One reliable and efficient version 
is the hierarchical error estimator 


1/2 
na = p +>) r) (105) 


TET Eeé 


where, for each T € J and E € £ and their test functions 


` (84) and (86), 


E R(wy) 


~ Werle y 


Pes R(ws) 
nee als 


nr: (106) 


(The proof of reliability and efficiency follows from (81)- 
(83) by the arguments from Section 5.2.4.) 


Example 14. In the Poisson problem with P, finite ele- 
ments and discrete space Vy, let W, consist of the quadratic 
and cubic bubble functions (80). Then, 


1/2 
E Rb? Rb)? 
nagse (= +5 ee (107) 


2 
TeT Pella Ee 


is a reliable and efficient hierarchical error estimator. With 
the error-reduction property of P, and P, finite elements 
due to Dérfler and Nochetto (2002), 


a\ 2 
Rb 
n= (do aoe (108) 
Ecg lbzla 

is reliable and efficient as well. The same is true if, for 
each edge E € £, bg defines a hat function of the midpoint 
of E with respect to a red-refinement J, of Ty (that is, 
each edge is halved and each triangle is divided into four 
congruent subtriangles; cf. Figure 15, left). 


5.5 Averaging error estimators 


Averaging techniques, also called (gradient) recovery esti- 
mators, focus on one mesh and one known low-order flux 
approximation p and the difference to a piecewise polyno- 
mial ĝ in a finite-dimensional subspace 0 C L (R; R”) 
of higher polynomial degrees and more restrictive continu- 
ity conditions than those generally satisfied by p. Averaging 
techniques are universal in the sense that there is no need 
for any residual or partial differential equation in order to 
apply them. 
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5.5.1 Definition of averaging error estimators 


The procedure consists of taking a piecewise smooth p and 
approximating it by some globally continuous piecewise 
polynomials (denoted by Q) of higher degree Ap. A simple 
example, frequently named after Zienkiewicz and Zhu and 
sometimes even called the ZZ estimator, reads as follows: 
For each node z € N and its patch œ, let 


pdx 
eR” (109) 


(Ap)(z) = “2 


f ldx 
Wz 


be the integral mean of p over w,. Then, define Ap by 
interpolation with (conforming, i.e. globally continuous) hat 
functions @,, for z € M, 


Ap = ) (AB)(@) 9, € @ 


zeN 


Let O= span{o,: z € M} denote the (conforming) first- 
order finite element space and let ||- || be the norm for 
the fluxes. Then the averaging estimator is defined by 


na t= I5 ABI (110) 
Notice that there is a minimal version 
nu ‘= min |b — gil < 14 (111) 
qGeQ 


The efficiency of ny follows from a triangle inequality, 
namely 


nw <lip—-Bil+llp Gh forallgeQ 13 


and the fact that || p — || = O(h) while (in all the examples 
of this chapter) 


min |p — Gil = h.o.t.(p) =: h.o.tege 
GeO 


This is of higher order for smooth p and efficiency follows 
for n = Ny and Cpg = 1. 

It turns out that n4 and ny are very close and accurate 
estimators in many numerical examples; cf. Section 5.6.4 
below. This and the fact that the calculation of n, is an 
easy postprocessing made ņ4 extremely popular. 

For proper treatment of Neumann boundary conditions, 
we refer to Carstensen and Bartels (2002) and for 
applications in solid and fluid mechanics to Alberty and 
Carstensen (2003) and Carstensen and Funken (200ta,b) 
and for higher-order FEM to Bartels and Carstensen (2002). 


Multigrid smoothing steps may be successfully employed 
as averaging procedures as proposed in Bank and Xu 
(2003). 


5.5.2 All averaging error estimators are reliable 


The first proof of reliability dates back to Rodriguez 
(1994a,b) and we refer to Carstensen (2004) for an 
overview. A simplified reliability proof for ny and hence 
for all averaging techniques (Carstensen, Bartels and Klose, 
2001) is outlined in the sequel. 

First let IT be the L, projection onto the first-order finite 
element space Ÿ and let Ẹ be arbitrary in Q, that is, each 
of the m x n components of @ is a first-order finite element 
function in F Ez. The Galerkin orthogonality shows for the 
error e := u — Ŭŭ and p := Vi in the situation of Example 1 
that 


lel? = [| vu-D- Ve- Te) dx 
Q 
+ [@-B-We—Me) ax 


A Cauchy inequality in the latter term is combined with the 
H'-stability of I, namely, 


Iye- idla S Cote Vell zc) 


(For sufficient conditions for this we refer to Crouzeix and 
Thomée (1987), Bramble, Pasciak and Steinbach (2002), 
Carstensen (2002, 2003b), and Carstensen and Verfiirth 
(1999).) The H?}-stability in the second term and an inte- 
gration by parts in the first term on the right-hand side 
show 


lela < fs (= Tejar + f(e- Te) - div § dx 
+Coapll Vell ll — Fiz.) 
Since e — Ie is L,-orthogonal onto f, := Tf € 7, 
[ f- (e — Me) dx 
Q 

= [F -4 e- Td 

< hz’ (e — Me) ic lef -Al 
Notice that, despite possible boundary layers, j#z(f — 
HP)llz,cay = ho-t. is of higher order. The first-order appro- 


ximation property of the L, projection, 


az! e — Ola < Capprox ll Vell 0) 


follows from the H!-stability (cf. e.g. Carstensen and 
Verfiirth (1999) for a proof). Similar arguments for the 
remaining term divg and the 7-piecewise divergence oper- 
ator div; with divg = div7g = div -(q — p) (recall that & 
is of first-order and hence Aŭ = 0 on each element) lead to 


[e-no - div ĝ dx 
Q 


< Cyprox ll Yell z lArdivz(§ — Pro 


An inverse inequality h-||div-(4 — Plz < Cill — 
Bilz cry (ef. Section 3.4) shows 


[te-ne) -div g dx S Capprox Cinv ll Vell. cay ll — Pll za) 


The combination of all established estimates plus a division 
by llela = [| Vellz,(q) yield the announced reliability result 


lella S (Csta + CapproxCiav lB — Ẹ liza) + ho-t. 


In the second step, one designs a more local approximation 
operator J to substitute II as in Carstensen and Bartels 
(2002); the essential properties are the H!-stability, the 
first-order approximation property, and a local form of the 
orthogonality condition; we omit the details. 


5.5.3 Averaging error estimators and edge 
contributions 


There is a local equivalence of the estimators n, from a 
local averaging process (109) and the edge estimator 


1/2 
mee z haltvati) 


Ec&g 


The observation that, with some mesh-size-independent 
constant C, 


Cone Sta Cg (113) 
dates back to Rodriguez (1994a) and can be found in 


Verfürth (1996). The proof of (113) for piecewise linears 
in V is based on the equivalence of the two seminorms 


o (4) = min la - "Ilr, 
and 
1/2 
ao = | X helllvelliz,e 
EGE 


for piecewise constant vector-valued functions q in P,, the 
set of possible fluxes ĝ restricted on the patch œ, of a 
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node z, and with the set of edges E, := {E € E:z e E}. 
The main observation is that 9, and ọ, vanish exactly for 
constants functions R”*" and hence they are norms on the 
quotient space P,/IR””". By the equivalence of norms on 
any finite-dimensional space P, /IR’"™”, there holds 


Co, <a) sCo,@) Vee P, 


This is a local version of (113), which eventually implies 

(113) by localization and composition; we omit the details. 
For triangles and tetrahedra and piecewise linear finite 

element functions, it is proved in Carstensen (2004) that 


Nu S14 $ Canny (114) 


with universal constants C, = ./10 and C; = ./15 for 2-D 
and 3-D, respectively. This equivalence holds for a larger 
class of elements and (first-order) averaging operators and 
then proves efficiency for ną whereas efficiency of ny 
follows from a triangle inequality in (112). 


5.6 Comparison of error bounds in benchmark 
example 


This section is devoted to a numerical comparison of energy 
errors and its a posteriori error estimators for an elliptic 
model problem. 


5.6.1 Benchmark example 


The numerical comparisons are computed for the Poisson 
problem 


1+Au=0inQ and u=0on dQ (115) 


on the L-shaped domain Q = (—1, +1)”\([0, 1] x [—1, OD 
and its boundary 90. The first mesh 7; consists of 17 
free nodes and 48 elements and is obtained by, first, a 
decomposition of Q in 12 congruent squares of size 1/2 
and, second, a decomposition of each of the boxes along 
its two diagonals into 4 congruent triangles. The subsequent 
meshes are successively red-refined (i.e. each triangle is 
partitioned into 4 congruent subtriangles of Figure 15, left). 
This defines the (conforming) P, finite element spaces V} C 
VC Vg C-e C V i= Ag(Q). The error e :=u—u, of 
the finite element solution u j= i in V, = V of dimension 
N= dim(V;) is measured in the energy norm (the Sobolev 
seminorm in H!(Q)) 


1/2 
lely2 = lelang = (f. IDe ax) = alu — ü, u + ü)!” 


5 1/2 
= (lul — Brey) 
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Figure 11. Experimental results for the benchmark problem (115) 
with meshes Fi, ..., Te. The relative error |ej,2/lulı.2 and vari- 
ous estimators /|xjı,2 are plotted as functions of the number 
of degrees of freedom N. Both axes are in a logarithmic scal- 
ing such that an algebraic curve N— is visible as a straight 
line with slope —a. A triangle with slope —0.33 is displayed 
for comparison. A color version of this image is available at 
http://www .mrw.interscience, wiley.com/ecm 


(by the Galerkin orthogonality) computed with the approx- 
imation aa ~ 0.21407315683398. 

Figure 11 displays the computed values of |e|,, for a 
sequence of uniformly refined meshes 7,, 7), ..., Tç as a 
function of the number of degrees of freedom N = 17, 81, 
353, 1473, 6017, 24321, 97793. It is this curve that will be 
estimated by computable upper and lower bounds explained 
in the subsequent sections where we use the fact that each 
element is a right isosceles triangle. 

Notice that the two axes in Figure 11 scale logarithmi- 
cally such that any algebraic curve of growth a is mapped 
into a straight line of slope —a. The experimental conver- 
gence rate is 2/3 in agreement with the (generic) singularity 
of the domain and resulting theoretical predictions. More 
details can be found in Carstensen, Bartels and Klose 
(2001). 


5.6.2 Explicit error estimators 


For the benchmark problem of Section 5.6.1, the error 
estimator (75) can be written in the form 


1/2 oa ca 
u 
Mrr (= HU) PS he f Ea a) 


TeT Eéég 
(116) 


and is reliable with C, = 1 and h.o.t... = O (Carstensen 
and Funken, 2001a). Figure 11 displays nz g as a function 
of the number of unknowns N and illustrates |e]; » < ng g 

Guaranteed lower error bounds, i.e. with the efficiency 
constant Cp, are less established and the higher-order terms 
h.o.t..¢g usually involve [if — fr llz r) for the elementwise 
integral mean fy of the right-hand side. Here, f = 1 and so 
h.0.t..¢ = 0. Following a derivation in Carstensen, Bartels 
and Klose (2001), Figure 11 also displays an efficient 
variant Ng, g as a lower error bound: nz x < lel; 2- 

The guaranteed lower and upper bounds with explicit 
etror estimators leave a very large region for the true error. 
Our interpretation is that various geometries (the shape of 
the patches) lead to different constants and C,,, = 1 reflects 
the worst possible situation for every patch in the current 
mesh. 

A more efficient reliable explicit error estimator np ¢ 
from Carstensen and Funken (2001a) displayed in Figure il 
requires the computation of local (patchwise) analytical 
eigenvalues and hence is very expensive. However, the 
explicit estimators ngc and ng g Still overestimate and 
underestimate the true error by ‘a huge factor (up to 10 
and even more) in the simple situation of the benchmark. 
One conclusion is that the involved constants estimate a 
worst-case scenario with respect to every right-hand side 
or every exact solution. 

This experimental evidence supports the design of more 
elaborate estimators: The stopping criterion (72) with the 
reliable explicit estimators may appear very cheap and easy. 
But the decision (72) may have too costly consequences. 


5.6.3 Implicit estimators 


For comparison, the two implicit estimators n; and ngo 
are displayed in Figure 11 as functions of N. It is stressed 
that both estimators are efficient and reliable (Carstensen 
and Funken, 1999/00) 


lely2 Sz < 2.37 lely > 


The practical performance of n; and neg in Figure 11 is 
comparable and in fact is much sharper than that of nz g 
and Nr, r- 


5.6.4 Averaging estimator 


The averaging estimators n, and ny are as well displayed 
in Figure 11 as a function of N. Here, na, is efficient up to 
higher-order terms (since the exact solution u € H/3-*(Q) 
is singular, this is not really gnaranteed) while its reliability 
is open, ie. the corresponding constants have not been 
computed. Nevertheless, the behavior of yn, and ny is 
exclusively seen here from an experimental point of view. 
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The striking numerical result is an amazing high accuracy of 
Naz © Ng as an empirical guess of |e]; >. If we took Ce into 
account, this effect would be destroyed: The high accuracy 
is an empirical observation in this example (and possibly 
many others) but does not yield an accurate guaranteed 
error bound. 


5.6.5 Adapted meshes 


The benchmark in Figure 11 is based on a sequence of 
uniform meshes and hence results in an experimental con- 
vergence rate 2/3 according to the corner singularity of 
this example. Adaptive mesh-refining algorithms, described 
below in more detail, are empirically studied also in 
Carstensen, Bartels and Klose (2001). The observations can 
be summarized as follows: The quality of the estimators and 
their relative accuracy is similar to what is displayed in 
Figure 11 even though the convergence rates are optimally 
improved to one. 


5.7 Goal-oriented error estimators 


This section provides a brief introduction to goal-oriented 
error control. 


5.7.1 Goal functionals 


Given the Sobolev space V = H} (Q) with a finite-dimen- 
sional subspace V C V, a bounded and V-elliptic bilinear 
form a: V x V — R, a bounded linear form F : V > R, 
there exists an exact solution u € V and a discrete solntion 
ie of 


alu, v} = F(v) Yue V and a(ã, 0) =F) WoeV 

(117) 
The previous sections concern estimations of the error 
é:=u—Z@ in the energy norm, equivalent to the Sobolev 
norm in V. Other norms are certainly of some interest as 
well as the error with respect to a certain goal functional. 
The latter is some given bounded and linear functional 
J: V — R with respect to which one aims to monitor the 
error, that is, one wants to find computable lower and upper 
bounds for the (unknown) quantity 


J) = J@| = \F@)| 


Typical examples of goal functionals are described by L3 
functions, for example, 


J) = f evar WeVv 
Q 


for a given ọ € L,(Q) or as contour integrals. 


In many cases, the main interest is on a point value and 
then J(v) is given by a mollification ọ of a singular measure 
in order to guarantee the boundedness of J: V > R. 


5.7.2 Duality technique 


To bound or approximate J(e) one considers the dual 
problem 


a(v,z)=J(v) Wev (118) 


with exact solution z € V (guaranteed by the Lax—Milgram 
Jemma) and the discrete solution Z € V of 


a, D =J) WieV 


Set f := z — Ž. On the basis of the Galerkin orthogonality 
ate, Z) = 0 one infers 


J(e) = ale, z) = a(e,z ~ Z) = ale, f) (119) 


As a result of (119) and the bonndedness of a one obtains 
the a posteriori estimate 


I8) < lal helly Wf lly < lall nune 


Indeed, utilizing the primal and dual residual R, and R, in 
v*, defined by 


R, := F—a(ŭ,) and R, =J- al, Z) 


computable upper error bounds for |jelly < n, and Il flly = 
n, can be found by the arguments of the energy error esti- 
mators of the previous sections. This yields a computable 
upper error bound |a] n,n, for |/(e)| which is global, 
that is, the interaction of e and f is not reflected. One 
might therefore speculate that the upper bound is often too 
coarse and inappropriate for goal-oriented adaptive mesh 
refinement. 


5.7.3 Upper and lower bounds of J (e) 


Throughout the rest of this section, let the bilinear form a be 
symmetric and positive definite; hence a scalar product with 
induced norm ||- ||,. Then, the parallelogram rule shows 


2J(e) =2ale, f) = le + FIÈ — el? -— WANE 


This right-hand side can be written in terms of residuals, 

in the spirit of (70), namely, lela = [[Res, lly. ifla = 

|Res, || y+, and 

le + fll, = lRes,,,lly+ for Res, ,, = F+J-—aéi+zZ,-) 
= Res, + Res, € V* 
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Therefore, the estimation of J (e) is reduced to the computa- 
tion of lower and upper error bounds for the three residuals 
Res,, Res,, and Res,,, with respect to the energy norm. 
This illustrates that the energy error estimation techniques 
of the previous sections may be employed for goal-oriented 
error control. 

For more details and examples of a refined estimation see 
Ainsworth and Oden (2000) and Babuška and Strouboulis 
(2001). 


5.7.4 Computing an approximation to J (e) 


An immediate consequence of (119) is 
J(e) = R) 


and hence J(e) is easily computed once the dual solution 
z of (118) is known or at least approximated to suffi- 
cient accuracy. An upper error bound for |J(e)| = [R(@)| 
is obtained following the methodology of Becker and Ran- 
nacher (1996, 2001) and Bangerth and Rannacher (2003). 
To outline this methodology, consider the residual rep- 
resentation formula (73) following the notation of Sec- 
tion 5.2.1. Suppose that z € H? (2) (e.g. for a H° regular 
dual problem) and let Jz denote the nodal interpolant in the 
lowest-order finite element space V. With some interpola- 
tion constant C; > 0, there holds, for any element T € T: 


A -3/2 
Az llz — Tella) + hy Piz- Izlar S CrlZlzaqy 
The combination of this with (73) shows 


J(e) = Res(z — Iz) 


= fr @-1er-¥ | re @- 1d ds 


TeT Eet 


= Y (irrino lz- Izin 
TeT 


+ Wrellz,ory lz- Taller) 


2 3/2 
= D C (blr, lza) + i Irets.en) kzlar) 
TET 


The influence of the goal functional in this upper bound 
is through the unknown H? seminorm |z|q2¢7), which is 
to be replaced by some discrete analog based on a com- 
puted approximation z,. The justification of some substitute 
|Dzz,| zacr) (postprocessed with some averaging technique) 
for |z| azr) is through striking numerical evidence; we refer 
to Becker and Rannacher (2001) and Bangerth and Ran- 
nacher (2003) for details and numerical experiments. 


6 LOCAL MESH REFINEMENT 


This section is devoted to the mesh-design task in the finite 
element method based on a priori and a posteriori infor- 
mation. Examples of the former type are graded meshes 
or geometric meshes with an a priori choice of refinement 
toward corner singularities bricfly mentioned in Section 6.1. 
Examples of the latter type are adaptive algorithms for auto- 
matic mesh refinement (or mesh coarsening) strategies with 
a successive call of the steps 


SOLVE = ESTIMATE => MARK = REFINE/COARSEN 


Given the current triangulation, one has to compute the 
finite element solution in step SOLVE; cf. Section 7.8 for 
a MATLAB realization of that. The accuracy of this finite 
element approximation is checked in the step ESTIMATE. 
On the basis of the refinement indicators of Section 6.2 
the step MARK identifies the elements, edges or patches in 
the current mesh in need of refinement (or coarsening). The 
new data structure is generated in step REFINE/COARSEN 
where a partition is given and a closure algorithm com- 
putes a triangulation described in Section 6.3. The conver- 
gence and optimality of the adaptive algorithm is discussed 
in Section 6.4. Brief remarks on coarsening strategies in 
Section 6.5 conclude the section. 


6.1 A priori mesh design 


The singularities of the exact solution of the Laplace equa- 
tion on domains with corners (cf. Figure 1) are reasonably 
well understood and motivate the (possibly anisotropic) 
mesh refinement toward vertices or edges. This section 
aims a short introduction for two-dimensional P, finite 
elements — Chapter 3, this Volume, will report on three- 
dimensional examples. 

Given a polygonal domain with a coarse triangulation 
into triangles (which specify the geometry), macro ele- 
ments can be used to fill the domain with graded meshes. 
Figure 12(a) displays a macro element described in the 
sequel while Figure 12(b) illustrates the resulting fine mesh 
for an L-shaped domain. 

The description is restricted to the geometry on the 
reference element T e with vertices (0, 0), (1, 0), and (0, 1) 
of Figure 12(a). The general situation is then obtained by 
an affine transformation illustrated in Figure 12(b). The 
macro element T p is generated as follows: Given a grading 
parameter B > 0 for a grading function g(t) = tê, and 
given a natural number N, set § := g(j/N) and draw 
line segments aligned to the antidiagonal through (0, &;) 
and EO for j =0,1,..., N. Each of these segments 
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Figure 12. (a) Reference domain Tye with graded mesh for B = 
3/2 and N = 4. (b) Graded mesh on L-shaped domain with 
refinement toward origin and uniform refinement far away from 
the origin. Notice that the outer boundaries of the macro elements 
show a uniform distribution and so match each other in one global 
regular triangulation. 


is divided into j uniform edges and so define the set 
of nodes (0,0) and &;/j (j —k,k) for k=0,...,j and 
j=l,..., N. The elements are then given by the vertices 
&;/j G — k, k) and Ẹ;/j G ~k — 1, k + 1) aligned with the 
antidiagonal and the vertex Ẹ;_;/(j — D G — k — 1, k) on 
the finer and §,,/G + D) G —k,k +1) on the coarser 
neighboring segment, respectively. The finest element is 
conv{(0, 0), (0, &)), (,0)} of diameter ./(2) g(1/N) © 
N, 

Figure 12(b) displays a triangulation of the L-shaped 
domain with a refinement toward the origin designed by 
a union of transformed macro elements with B = 3/2 and 
N =7. The other vertices of the L-shaped domain yield 
higher singularities, which are not important for the first- 
order Courant finite element. 

The geometric mesh depicted in Figure 13 yields a finer 
refinement toward some corners of the polygonal domain. 
Given a parameter B > 0 in this type of triangulation, 


(a) 


Figure 13. (a) Reference domain Tep with geometric mesh for 
parameter $ = 1/2 and N = 4. This mesh can also be gener- 
ated by an adaptive red~green—blue refinement of Section 6.3. 
(b) Iustration of the closure algorithm, The refinement triangula- 
tion with 50 element domains is obtained from the mesh (a) with 
18 element domains by marking one edge (namely the second 
along the antidiagonal) in the mesh (a). 


the nodes &) := 0 and £; := BY~/ for j = 1,..., N define 
antidiagonals through (§;, 0) and (0, §,), which are in turn 
bisected. For such a triangulation, the polynomial degrees 
Pr on each triangle T are distributed as follows: py = 
1 for the two triangles T with vertex (0,0) and py = 
j+2 for the four elements in the convex quadrilateral 
with the vertices (&;, 0), (0,,), (41,0), and (0, &;41) 
for j =0,...,N — 1. Figure 14 compares experimental 
convergence rates of the error in H ~l seminorm |e} gı for 
various graded meshes for the P, finite element method, 
the p-and hp-finite element method, and for the adaptive 
algorithm of Section 6.4. The P, finite element method 
on graded meshes with Bp = 3/2, B =2 and h-adaptivity 
recover optimality in the convergence rate as opposite to the 
uniform refinement ($ = 1), leading only to a sub-optimal 
convergence due to the corner singularity. The Ap-finite 
element method performs better convergence rate compared 
to the p-finite element method. 

Tensor product meshes are more appropriate for smaller 
values of $; the one-dimensional model analysis of Babuška 
and Guo (1986) suggests B = (2)1/2 — 1 % 0.171573. 


6.2 Adaptive mesh-refining algorithms 


The automatic mesh refinement for regular triangulations 
called MARK and REFINE/COARSEN frequently consists 
of three stages: 


G) the marking of elements or edges for refinement (or 
coarsening); 
(ii) the closure algorithm to ensure that the resulting 
triangulation is (or remains) regular; 
(iii) the refinement itself, i.e. the change of the underlying 
data structures. 
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Figure 14. Experimental convergence rates for various graded 
meshes for the P; finite element method, the p- and hp-finite 
element method, and for the adaptive algorithm of Section 6.4 
for the Poisson problem on the L-shaped domain. 


This section will focus on the marking strategies (i) while 
the subsequent section addresses the closure algorithm (ii) 
and the refinement procedure (iii). 

In a model situation with a sum over all elements T € 7 
(or over all edges, faces, or nodes), the a posteriori error 
estimators of the previous section give rise to a lower or 
upper error bound in the form i 


n= z a) 


TeT 


Then, the marking strategy is an algorithm, which selects 
a subset M of T called the marked elements; these are 
marked with the intention of being refined during the later 
refinement algorithm. 

A typical algorithm computes a threshold L, a positive 
real number, and then utilizes the refinement rule or mark- 
ing criterion 


makTeT if L<ny 


Therein, ny is referred to as the refinement indicator 
whereas L is the threshold; that is, 


M:={Té TL < nz} 


Typical examples for the computation of a threshold L are 
the maximum criterion 


L:= © max{nz:T € 7} 


or the bulk criterion where L is the largest value such that 


(1-0) n < DO 1} 


TET TEM 


The parameter © is chosen with 0 < © < 1; © = 0 cor- 
responds to an almost uniform refinement and © = 1 to a 
raw refinement of just a few elements (no refinements in 
the bulk criterion). 

The justification of the refinement criteria is essentially 
based on the heuristic of an equi-distribution of the refine- 
ment indicator; see Babuška and Rheinboldt (1978, 1979) 
and Babuška and Vogelius (1984) for results in 1-D. A rig- 
orous justification for some class of problems started with 
Dörfler (1996); it is summarized in Morin, Nochetto and 
Siebert (2003b) and will be addressed in Section 6.4. 

A different strategy is possible if the error estimator 
gives rise to a quantitative bound of a new meshsize. For 
instance, the explicit error estimator can be rewritten as 
Ng = WArRilp,q with a given function R € L,(Q) and 
the local mesh size Ay (when edge contributions are recast 
as volume contributions). Then, given a tolerance Tol and 
using the heuristic that R would not change (at least not 
dramatically increase) during a refinement, the new local 
mesh size A™™ can be calculated from the condition 


lA" Ril cay = Tol 


upon the equi-distribution hypothesis A™®™ œ Tol/R. Ano- 
ther approach that leads to a requested mesh-size distri- 
bution is based on sharp a priori error bounds, such as 
lAr D’ulz, o) where D?u denotes the matrix of all second 
derivatives of the exact solution u. Since D?u is unknown, 
it has to be approximated by postprocessing a finite element 
solution with some averaging technique. 

The aforementioned refinement rules for the step MARK 
(intended for conforming FEM) ignore further decisions 
such as the particular type of anisotropic refinement or the 
increase of polynomial degrees versus the mesh refinements 
for hp-FEM. One way to handle such subtle decisions 
will be addressed under the heading coarsening strategy 
in Section 6.5. 


6.3 Mesh refining of regular triangulations 


Given a marked set of objects such as nodes, edges, faces, 
or elements, the refinement of the element (triangles or 
tetrahedra) plus further refinements (closure algorithm) for 
the design of regular triangulations are considered in this 
section. 


PA 71 
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Figure 15. Red-, green-, blue-(left)- and blue-(right) refinement with reference edge on bottom of a triangle (from left to right) into 
four, two, and three subtriangles. The bold lines opposite the newest vertex indicate the next reference edge for further refinements. 


6.3.1 Refinement of a triangle 


Triangular elements in two dimensions are refined into two, 
three, or four subtriangles as indicated in Figure 15. All 
these divisions are based on hidden information on some 
reference edge. Rivara (1984) assumed the longest edge in 
the triangle as base of the refinement strategies while the 
one below is based on the explicit marking of a reference 
edge. In Figure 15, the bottom edge of the original triangle 
acts as the reference edge and, in any refinement, is halved. 
The four divisions displayed correspond to the bisection of 
one (called green-refinement), of two (the two versions of 
blue-refinement), or of all three edges (for red-refinement) 


4 3 4 

4 
Vw — 
4 3 4 

4 
XN 
4 3 4 
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4 3 4 

4 
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4 3 4 


as long as the bottom edge is refined. In Figure 15, the 
reference edges in the generated subtriangles are drawn with 
a bold line. 


6.3.2 Refinement of a tetrahedron 


The three-dimensional situation is geometrically more com- 
plicated. Therefore, this section is focused on the bisection 
algorithm and the readers are referred to Bey (1995) for 
3-D red-refinements. 

Figure 16 displays five types of tetrahedra which shows 
the bottom triangle with vertices of local label 1, 2, and 
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Figure 16. Bisection rules in 3-D after Arnold, Mukherjee and Pouly (2000) which essentially dates back to Bansch (1991). It involves 
five types of tetrahedra (indicated as P,, A, M, O, and Pp on the left) and their bisections with an initialization of M —> 2 P, anc 
O -> 2 P, followed by a successive loop of the form A —> 2 Py > 4 P; —> 8A and so on. Notice that the forms of P, and Py are 


identical, but play a different role in the refinement loop. 
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3 while the remaining three faces with the vertex label 4 
are drawn in the same plane as the bottom face; hence, the 
same number 4 for the top vertex is visible at the end-points 
of the three outer subtriangles. 

Each triangular face has a reference edge and (at least) 
one edge is a reference edge of the two neighboring faces 
and that determines the vertices with local numbers 1 and 2. 
In the language of Arnold, Mukherjee and Pouly (2000), the 
five appearing types are called P,, A, M, O, and Pp. The 
bisection of the edges between the vertices number | and 2 
with the new vertex number 5 are indicated in Figure 16. 

It is an important property of the inherited reference 
edges that the edges of each element K in the original 
regular triangulation T are refined in a cyclic way such that 
four consecutive bisections inside K are necessary before a 
new vertex is generated, which is not the midpoint of some 
edge of K. 


6.3.3 Closure algorithms 


The bisection of some set of marked elements does not 
always lead to a regular triangulation — the new vertices 
may be hanging nodes for the neighboring elements. Further 
refinements are necessary to make those nodes regular. The 
default procedure within the class of bisection algorithms 
is to work on a set of marked elements M, 


Closure Algorithm for Bisection. Input a regular trian- 
gulation T® := Tand an initial subset M® := M c TO 
of marked elements, set Z = Ø and k := 0. 

While M® x Ø repeat (i)—(iv): 


(i) choose some element K in M® with reference edge 
E and initiate its midpoint zg as new vegex, set 
Z:= ZU {zg}; 

Gi) bisect E and divide K into K, and K_ and set 
TH) i= {K , K_}U(T™ \ (Kp); 

Gii) find all elements 7,,...,7,, €7%* with han- 
ging node ze Z (if any) and set M@+) = 
{To -> Tag} U(M® \ {K}; 

(iv) update k := k + 1 and go to (i). 


Output a refined triangulation T#, 


According to step (iii) of the closure algorithm, any ter- 
mination leads to a regular triangulation J. The remaining 
essential detail is to guarantee that there will always occur a 
termination via M® = Ø for some k. In the newest-vertex 
bisection, for instance, the reference edges are inherited in 
such a way that any element K € 7; the initial regular tri- 
angulation, is refined only by subdivisions which, at most, 
halve each edge of K. Since the closure algorithm only 
halves some edge in Zand prohibits any further refinements, 


any intermediate (irregular) T® remains coarser than or 
equal to some regular uniform refinement Tof T: This is the 
main argument to prove that the closure algorithm cannot 
refine forever and stops after a finite number of steps. 

Figure 13(6) shows an example where, given the initial 
mesh of Figure 13(a), only one edge, namely the second on 
the diagonal, is marked for refinement and the remaining 
refinement is induced by the closure algorithm. Neverthe- 
less, the number of new elements can be bounded in terms 
of the initial triangulation and the number of marked ele- 
ments (Binev, Dahmen and DeVore, 2004). 

The closure algorithm for the red—green—blue refinement 
in 2-D is simpler when the focus is on marking of edges. 
One main ingredient is that each triangle K is assigned 
a reference edge E(K). If we are given a set of marked 
elements, let M denote the set of corresponding assigned 
reference edges. 


Closure Algorithm for Red—Green-Blue Refinement. 
Input a regular triangulation 7 with a set of edges € and 
an initial subset M C E of marked edges, set k := 0 and 
M® := NO := M. 

While M® Æ Ø repeat (i)-(iv): 


(i) choose some edge E in N™ and let T, € T denote 
the (at most) two triangles that share the edge E C 
OT; 

Gi) set MYD := M® U {E,, E_} with the reference 
edge E, := E(T,) of T,; 

Git) if MED = M® set NED = NO \ {E} else set 
NOD = (NM U {E E-D MER 

(iv) update k := k + 1 and go to (i). 


Bisect the marked edges {E e M® : E C ƏT} of each 

element T € T and refine T by one of the red-green—blue 

refinement rules to generate elementwise a partition T. 
Output the regular triangulation T. 


The closure algorithm for red—green—blue refinement 
terminates as A is decreasing and M® is increasing 
and outputs a set M := M of marked edges with the 
following closure property: Any element T e 7 with an 
edge in M satisfies E(T) e M, i.e. if T is marked, then 
at least its reference edge will be halved. This property 
allows the application of one properly chosen refinement 
of Figure 15 and leads to a regular triangulation. 

The reference edge E(K) in the closure algorithm is 
assigned to each element K of the initial triangulation 
and then is inherited according to the rules of Figure 15. 
For newest-vertex bisection, each triangle with vertices of 
global numbers j, k, and £ has the reference edge opposite 
to the vertex number max{ j, k, £}. 


On the basis of refinement rules that inherit a reference 
edge to the generated elements, one can prove that a finite 
number of affine-equivalent elements domains occur. 


6.4 Convergence of adaptive algorithms 


This section discusses the convergence of a class of adap- 
tive P, finite element methods based on newest-vertex 
bisection or red—green—blue refinement of Section 6.3.1. 
For the ease of this presentation, let us adopt the notation of 
the residual representation formula (73) of Section 5.2.1 in 
2-D and perform a refinement with four bisections of each 
triangle with a marked edge as illustrated in Figure 17. 


Adaptive Algorithm. Input a regular triangulation 
T® := T with a set of reference edges {E(K) : K €e T} 
and a parameter 0 < © < 1, set k := 0. 

Repeat (i)—(v) until termination: 


G) compute discrete solution u, € V, based on the tri- 
angulation 7 with set of interior edges €, 

(ii) compute Nk = hellre lE F Lrere h} irr Io 
for any E €e E® with the set XE) of its at most two 
neighboring elements; 

Gii) mark edges in M® C EP by the bulk criterion such 


that 
a-o Sins D nè 


Ech EeM® 


(iv) for any E e M®, bisect each element T € KE) (at 
least) four times (as in Figure 17(a)) such that (at 
least) each midpoint z g of any edge E € £(T) and the 
midpoint of T become new nodes (cf. Figure 17(b)); 

(v) call closure algorithm to generate a refined regular 
triangulation T+, update k := k + 1 and go to (i). 


Output a sequence of finite element meshes 7 with asso- 
ciated finite element spaces V, and discrete solntions u}. 

Recall that u is the exact solution. Then there holds the 
error-reduction property 


lu — ugil? < ollu — ul? + ose( f)? 


AA 


(a) (b) (c) 


Figure 17. (a) Refinement of triangle with four bisections from 
the adaptive algorithm of Section 6.4. (b) Refinement of the 
edge neighborhood , with four bisections of each element T}. 
(c) Refinement of wz with four bisections of one element T} and 
green-refinement of T~. 
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for any k and with a constant 0 < 9 < 1, which depends on 
T and © > 0. The oscillations in osc( f) may be seen as 
higher-order terms when compared with the volume term 
lz f lizo of the error estimator. 

An outline of the proof of the error-reduction property 
concludes this section to illustrate that 9 < 1 is indepen- 
dent of any mesh size, the level k, and number of ele- 
ments N, := card (T ®©), The proof follows the arguments 
of Section 5.4.3 for Vy = V := V, and V, := Vy ® W, S 
Viz1- The space W, is spanned by all bg and by all by 
for all T € T(E) for Ee M® c €, which, here, are 
given by a nodal basis function in the new triangulation 
of the new interior node of T € T(E) and of the midpoint 
of E e M®. These basis functions substitute the bubble 
functions (80) in Section 5.2.4 and have in fact the prop- 
erties (81)-(83). One then argues as in Section 5.4.3 with 


some 
Wy = DP (we + ys wr) e W, 


EeM® TET(E) 


to show ||w,,||2 < C(R(w,) + osc(f)) and eventually 


SO nb < Cy, — ug ll? + ose(f)) 
EeM® 


for u, = ugy; and uy :=u,. This and the bulk criterion 
lead to 


lu ~ ull? £ Cier ( D nt ear) 


Ece*) 


i a @) Cre py Ne ai seo) 


EemM® 


$C (1 — ©) Cy (lu, — ug l + ose fy”) 
Utilizing the Galerkin orthogonality 
lur — ugl? = llu — ugl? — lu — u, l2 


one concludes the proof. 

It is interesting to notice that the conditions on the 
refinement can be weakened. For instance, it suffices to 
refine solely one element in 7(£) such that there is an 
interior node as in Figure 17 (c). (But then the oscilla- 
tions are coarser and involve terms like 4 gll f — fgll La(wz) 
with an integral mean fp over an edge-patch œp.) How- 
ever, the counterexample of Figure 10 clearly indicates that 
some specific conditions are necessary to ensure the error- 
reduction property. 
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6.5 Coarsening 


The constructive task in approximation theory is the design 
of effective approximations of a given function u. Replac- 
ing the unknown x by a very accurate known approximation 
(e.g. obtained by some overkill computation) of it, one may 
employ all the approximation techniques available and find 
an efficient representation in terms of an adapted finite ele- 
ment space, i.e. in terms of an underlying adapted mesh 
and/or an adapted distribution of polynomial degrees. 

The input of this coarsening step consists, for instance, of 
a given triangulation 7, and a given finite element solution 
u, and some much more accurate approximation Z,. Then, 
the difference 


Ne = lug — tlle 


is an error estimator as accurate as ||u — u,||, (reliability 
and efficiency is proved by a triangle inequality). Moreover, 
one might use np = |i, — url mir) as a local refinement 
indicator in the setting of the benchmark example. 

Moreover, many decisions of the mesh design (e.g. 
choice of anisotropic elements, choice of polynomial deg- 
ree, etc.) can be based on the explicit availability of i,. 

One single coarsening step is certainly pointless, for 
the reference solution #, is known and regarded as very 
accurate (and so there is no demand for further work). 
On the other hand, a cascade of coarsening steps requires 
in each step, a high accuracy relative to the precision of 
that level and hence a restricted accuracy. A schematic 
description of this procedure reads as follows: 


Coarsening Algorithm for Adaptive Mesh Design. Input 
a finite element space Vp with a finite element solution uo, 
set k := 0. 

Repeat (i)—(iv) until termination: 


G) design a super space P, of V, by uniform red- 
refinements plus uniform increase of all the polyno- 
mial degrees; 

(ii) compute accurate finite element solution ĉ, with 
respect to F; 

(ii) coarsen VA based on given #, and design a new finite 
element space V,.4; 
(iv) update k := k + 1 and go to (i). 


The paper of Binev, Dahmen and DeVore (2004) presents 
a realization of this algorithm with an adaptive algorithm 
on stage (i) and then proves that the convergence rate of 
the overall procedure is optimal with respect to the energy 
error as a function of the number of degrees of freedom. 

An automatic version of the coarsening strategy is called 
the hp-adaptive finite element method in Demkowicz (2003) 


for higher-order polynomials adaptivity. This involves algo- 
rithms for the determination of edges that will be refined 
and of their polynomial degrees. 


7 OTHER ASPECTS 


In this section, we discuss briefly several topics in finite 
element methodology. Some of the discussions involve the 
Sobolev space wi (2) (1 < p < 00), which is the space of 
functions in L, (2) whose weak derivatives up to order k 
also belong to L, (2), with the norm 
) 1/p 
L,(2) 


lle = ( L 
ay 
(a3) 


lalsk 
For 1 < p < 0o, the seminorm (Xue l(0"¥/8x")IIz,@)/? 
will be denoted by Pl we coy» and the seminorm 
MAX gx [I(O"U/9x") Il 1.) Will be denoted by jule (o) 


a%v 
axe 


for 1 < p < co and 


Hull ws ay = max 
Woo (2) jalsk LoD 


7.1 Nonsymmetric/indefinite problems 


The resnlts in Section 4 can be extended to the case 
where the bilinear form a(-,-) in the weak problem (1) 
is nonsymmetric and/or indefinite due to lower order terms 
in the partial differential equation. We assume that a(., -) is 
bounded (cf. (2)) on the closed subspace V of the Sobolev 
space H™(S2) and replace (3) by the condition that 

alw, v) + Lull?) = Glom  Yvev (120) 


where L is a positive constant, 


Example 15. Let a(., -) be defined by 
g ðv 
a(v,,v = f Vn Vode + Oor 
1 V2) a ie 2 qf Ox; 2 


+ f e(x)v, v dx (121) 
Q 


for all v,, v, € H! (Q), where b;(x) (1 < j <d), cx) € 
L (9). If we take V = {v € H1(Q): v| p = 0} and F is 


defined by (6), then (1) is the weak form of the nonsym- 
metric boundary value problem 


a 
ð 
Da u=0 o T, 


ð 
—=0 on QT (12) 
ðn 
and the coercivity condition (120) follows from the well- 
known Garding’s inequality (Agmon, 1965). 


Unlike the symmetric positive definite case, we need to 
assume that the weak problem (1) has a unique solution, 
and that the adjoint problem is also uniquely solvable, that 
is, given any G € V* there is a unique w € V such that 


a(v,w) = G(v) Vuev (123) 


Furthermore, we assume that the solution w of (123) enjoys 
some elliptic regularity when G(v) = (g, v) L for gE 
L (2), i.e., w € H™**(Q) for some a > 0 and 


lw |l ataco = Cllelrxa (124) 


Let T be a triangulation of Q with mesh size h-= 
maxr.7diamT and V;C V be a finite element space 
associated with 7 such that the following approximation 
property is satisfied: 


Yw e H™4(Q) 
(125) 


inf ||w — ull yma) S € 
ant lw vlan S ql wl yma) 


where 


€r40 as hr} O (126) 


The discrete problem is then given by (54). 

Following Schatz (1974) the well-posedness of the dis- 
crete problem and the error estimate for the finite element 
approximate solution can be addressed simultaneously. 
Assume for the moment that uz € Vris a solution of (54). 
Then we have 

atu — upv) =0 Vue Vr (127) 
We use (127) and a duality argument to estimate |u — 
urlio in terms of ||u — urlano Let w e V satisfy 


atv, w) = (u ~ üp vra Yve Vr (128) 


We obtain, from (2), (127), and (128), the following analog 
of (24): 


i f 
lu ~ uzl? o = alu — ug w) < C( ing lw— vlan) 
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x lju — urlao) (129) 
and hence, by (124) and (125), 
lu — urlio S erlu — urlano (130) 
It follows from (120) and (130) that 
lu — urlao <aļlu — urnu — uz) + Chju — tizi rga 
which together with (126) implies, for Ay sufficiently small, 
lu — Ulf) £ aU ~ up u — Uz) (131) 


For the special case where F = 0 and u = 0, any solution 
ur of the homogeneous discrete problem 
~ 


a(uz,v) =0 Yve Vy 


must satisfy, by (131), 


2 
larlan SO 


We conclude that any solution of the homogeneous discrete 
problem must be trivial and hence the discrete problem (19) 
is uniquely solvable provided his sufficiently small. Under 
this condition, we also obtain immediately from (2), (127), 
and (131), the following analog of (22): 


lu — url yma) < C inf llu — vll ano (132) 


Concrete error estimates now follow from (132), (129), 
and the results in Section 3.3. 


7.2 Nonconforming finite elements 


When the finite element space F Ez defined by (32) does 
not belong to the Sobolev space H™ (N9) where the weak 
problem (1) of the boundary value problem is posed, it 
is referred to as a nonconforming finite element space. 
Nonconforming finite element spaces are more flexible and 
are useful for problems with constrains where conforming 
finite element spaces are more difficult to construct. 


Example 16 (Triangular Nonconforming Elements) Let 
K be a triangle. If the set Ng consists of evaluations of 
the shape functions at the midpoints of the edges of K 
(Figure 18a), then (K, P}, Ng) is the nonconforming P, 
element of Crouzeix and Raviart. It is the simplest trian- 
gular element that can be used to solve the incompressible 
Stokes equation. 
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If the set Ng consists of evaluations of the shape func- 
tions at the vertices of K and the evaluations of the normal 
derivatives of the shape functions at the midpoints of the 
edges of K (Figure 18b), then (K, P,, Nx) is the Morley 
element. It is the simplest triangular element that can be 
used to solve the plate bending problem. 


Example 17 (Rectangular Nonconforming Elements) 
Let K be a rectangle. If Py is the space spanned by the 
functions 1, x}, x, and x? — x3 and the set Mg consists of 
the mean values of the shape functions on the edges of K, 
then (K, Px, Ng) is the rotated Q} element of Rannacher 
and Turek (Figure 19{a), where the thick lines represent 
mean values over the edges). It is the simplest rectangular 
element that can be used to solve the incompressible Stokes 
equation. 

If Px is the space spanned by the functions 1, x,, x5, xt; 
XX, X2, x2x_ and x,x? and the set Ng consists of eval- 
uations of the shape functions at the vertices of K and 
evaluations of the normal derivatives at the midpoints of 
the edges (Figure 19b), then (K, Px, Ng) is the incomplete 
Q, element. It is the simplest rectangular element that can 
be used to solve the plate bending problem. 


Consider the weak problem (1) for a symmetric positive 
definite boundary value problem, where F is defined by 
(6) for a function f € L,(Q). Let Vybe a nonconforming 
finite element space associated with the triangulation 7. We 
assume that there is a (mesh-dependent) symmetric bilinear 


(a) (b) 


Figure 18, Triangular nonconforming finite elements. 


(a) (b) E 


Figure 19. Rectangular nonconforming finite elements. 


form a,(-,-) defined on V + Vy such that @ arv, v) = 
a(v, v) for v € V, Gi) ar, +) is positive definite on Vz. 
The discrete problem for the nonconforming finite element 
method reads: Find uy € Vy such that 


arlün v) = fra Yve Vr (133) 


Example 18. The Poisson problem in Example 1 can be 
solved by the nonconforming P, finite element method 
in which the finite element space is Vy= {v € L3(®2) : 
v|; € P(T) for every triangle T € T, v is continuous at the 
midpoints of the edges of Tand v vanishes at the midpoints 
of the edges of 7 along T} and the bilinear form ay is 
defined by 


tte f Vo- Yvde 134) 
Ter’? 


The nonconforming Ritz—Galerkin method (133) can be 
analyzed as follows. Let y€ Vybe defined by 


ar(iz, v) = aru, v) Yv € Vr 
Then we have 


u— ü = inf |u —v 
lu — õla, = inf lle = Plog 


where ||wla, = (ar{w, w))!/ is the nonconforming energy 
norm defined on V + Vy, and we arrive at the following 
generalization (Berger, Scott and Strang, 1972) of (21): 


lu — Ulla, S lu — Ella, + tr urlar 


aziz — upv) 


W 


inf lu — vlla + sup 


vevo Nolar 
arlu — uz, v) 
= inf |u — vla t sup E (135) 
veVr veVr\(0} Illa, 


Remark 17. The second term on the right-hand side of 
(135), which vanishes in the case of conforming Ritz—Ga- 
jerkin methods, measures the consistency errors of noncon- 
forming methods. 


As an example, we analyze the nonconforming P} 
method for the Poisson problem in Example 18. For each 
T e T, we define an interpolation operator Ily : H T) > 
P,(T) by 


1 
Trom) = Í tds 


where m, is the midpoint for the edge e of T. The interpola- 
tion operator IT, satisfies the estimate (43) for m = 1, and 
they can be pieced together to form an interpolation oper- 
ator Tl: HQ) > Vr. Since the solution u of (5) belongs 
to H't* (T), where 0 < a(T) < 1, the first term on the 
right-hand side of (135) satisfies the estimate 


< lu — Muha, 


inf |w — v 
inf u= vla $ 


1/2 
<c (Scam DP lear) (136) 
TeT 


where the constant C depends only on the minimum angle 
in T 

To analyze the second term on the right-hand side of 
(135), we write, using (5), (133), and (134), 


aru -uni)= -F f ilds (137) 


ecEr"? 


where Ez is the set of edges in 7 that are not on T, n, is 
a unit vector normal to e, and [v], = v |, — v_ is the jump 
of v across e (n, is pointing from the minus side to the 
plus side and v = 0 outside Q). Since [v], vanishes at the 
midpoint of e € Ep we have 


du dtu — p} 
5 kds = [SP ones YpeP, (138) 
e e e e 


Let T, = {T € Te C ôT}. It follows from (138), the trace 


theorem and the Bramble—Hilbert lemma (cf. Remark 10) 
that 


ðu A 2 
[inalsca[g t-r 


Tet, 


Tet. 


12 1/2 
+ (diam TD ce) (= Bp a) 


1/2 1/2 
<c (gea Pann} (= Mv] 


Te] TEX 
(139) 
We conclude from (137) and (139) that 
a,(u — uy, V) = 
ip ar d a alai <C X (diam DOC lulan 
veVr lela, Ter 
(140) 


where the constant C depends only on the minimum angle 
in Z: 


Finite Element Methods 107 


Combining (135), (136), and (140) we have the following 
analog of (57) 


Do lu — uring) < C $ (diam TPO lub anr 
TET TeT 
(141) 


Remark 18. Estimates of lju —uzllz,¢q) can also be 
obtained for nonconforming finite element methods 
(Crouzeix and Raviart, 1973). There is also a close 
connection between certain nonconforming methods and 
mixed methods (Arnold and Brezzi, 1982). 


7.3 Effects of numerical integration 


The explicit form of the finite element equation (54) 
involves the evaluations of integrals which, in general, 
cannot be computed exactly. Thus, the effects of numer- 
ical integration must be taken into account in the error 
analysis. We will illustrate the ideas in terms of finite ele- 
ment methods for simplicial triangulations. The readers are 
referred to Davis and Rabinowitz (1984) for a comprehen- 
sive treatment of numerical integration. 

Consider the second order elliptic boundary value prob- 
lem (5) in Example 1 on a bounded polyhedral domain Q C 
R? (d =2 or 3). Let Tbe a simplicial triangulation of Q 
such that T is a union of the edges (faces) of Tand Vy C V 
be the corresponding P, Lagrange finite element space. 

In Section 4, the finite element approximate solution 
uz € Vy is defined by (54). But in practice, the integral 
F(v) = fo fv dx is evaluated by a quadrature scheme and 
the approximate solution uy € Vis actually defined by 

alun v) = Frv) Vuev, (142) 
where F--(v) is the result of applying the quadrature scheme 
to the integral F(v). 

More precisely, let D € Tbe arbitrary and ®p : S > D 
be an affine homcomorphism from the standard (closed) 
simplex S onto D. It follows from a change of variables 
that 


[teas = [den iFoop)v0p) a8 


where without loss of generality det Jẹ, (the determinant 
of the Jacobian matrix of ©) is assumed to be a positive 
number. The integral on S is evaluated by a quadrature 
scheme J, and the right-hand side of (142) is then given by 


F,(v) = D> Ig((det Jy, (Ff oPp)(vo@p)) (143) 


DeT 


108 Finite Element Methods 


The error u — uy can be estimated by the following 
analog of (135): 


Iu -urla < inf ju -vla + sup “4 =n») 
veVr veVr\{0} tulle 
(144) 

The first term on the right-hand side of (144) can be esti- 
mated by ju — I1,u||, as in Section 4.1. The second term 
on the right-hand side of (144) measures the effect of 
numerical quadrature. Below we give conditions on the 
quadrature scheme w > J,(w) and the function f so that 
the magnitude of the quadrature error is identical with that 
of the optimal interpolation error for the finite element 
space. 

We assume that the quadrature scheme w +> J;(w) has 
the following properties: 


Hs(w)| < Cs max lw)  VweC%S), (145) 


I;(w) = i wd? Wwe Py_» (146) 
S 


We also assume that f e Wp (Q) such that q 22 and 
n >d/q, which implies, in particular, by the Sobolev 
embedding theorem that f € C°(Q) so that (143) makes 
sense. 

Under these conditions it can be shown (Ciarlet, 1978) 
by using the Bramble—Hilbert lemma on S that 


Lf fudx — I,((det Jg,)(f °p)(v2 ®p)) 


< C(diam D)"|D{9/2-“/9 | Flm lvla YY € Vr 


(147) 
where the positive constant C depends only on the shape 
regularity of D. It then follows from (1), (143), (147) and 
Hélder’s inequality that 


Í fudx — F,(v) 
Q 


£ ChA llwe lvla 


ja(u — un v)| = 


YveVr (148) 


We conclude from (144) and (148) that 
lu ~ urla < llu — Hrla + CAH F llwecey (149) 


We see by comparing (43) and (149) that the overall 
accuracy of the finite element method is not affected by 
the numerical integration. 

We now consider a general symmetric positive definite 
elliptic boundary problem whose variational form is defined 


by 


ad 
ðw ðv 
a(w, v) = „a —— dx b 
w= D faw a [powa 

(150) 
where aj bE W2,(Q), b>0 on Q and there exists a 
positive constant c such that 


d 
a,,(x)§ 8; ac? WxeQ and § eR? (151) 
1 


ij= 


Tn this case, the bilinear form (150) must also be evalu- 
ated by numerical integration and the approximation solu- 
tion uy € Vyto (1) is defined by 


a7 (uz, v) = Fr(v) Vue Vr (152) 


where a7(w,v) is the result of applying the quadrature 
scheme I, to the pull-back of 


d 
ôw ð 
È Sawa f b(x)wv dx 
D ax; ax; D 


i jel 


on the standard simplex S under the affine homeomorphism 
Èp, over all De T. 


The error u — uy can be estimated under (145), (146) and 
the additional condition that 


g2z0onS ==> /,(g)>0 (153) 
Indeed (146), (151), (153) and the sign of b imply that 


arlu, v) > cvol = VU € Vr (154) 


and we have the following analog of (144) 


ju — urlao S e lu — vla 
1 aņz(u, v) — 
gl og ae Uau w) 
c | vevo cs) 


“fe ig a(u, v) — ar (ür v) (155) 
veVr\{0} lala) 

The first term on the right-hand side of (155) is domi- 
nated by |u— Mulao. Since a(u, v) — arlur v) = 
Jo fv dx — F;(v), the third term is controlled by the esti- 
mate (148). The second term, which measures the effect of 


2 
f 
f 
H 
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numerical quadrature on the bilinear form a(-,-), is con- 
trolled by the estimate 


lazu, v)— alu, v)| <C $ (diam Dy) 
DeT 


X |u| gop) lola) (156) 


provided the solution u belongs to H!**)(D) for each 
D eT and 1/2 < a(D) < 1. The estimate (156) follows 
from (145), (146) and the Bramble—Hilbert lemma, and 
the positive constant C in (156) depends only on the W3, 
norms of a; and b and the shape regularity of 7. Again 
we see by comparing (56), (149), and (156) that the overall 
accuracy of the finite element method is not affected by the 
numerical integration. 


Remark 19. For problems that exhibit the phenomenon 
of locking, the choice of a lower order quadrature scheme 
in the evaluation of the stiffness matrix may help alleviate 
the effect of locking (Malkus and Hughes, 1978). 


7.4 Curved domains 


So far, we have restricted the discussion to polygonal 
(polyhedral) domains. In this section, we consider the 
second-order elliptic boundary value problem (5) on a 
domain & C R? with a curved boundary. For simplicity, 
we assume that T = 02. 

First, we consider the P, Lagrange finite element method 
for a domain & with a C? boundary. We approximate Q by 
a polygonal domain 9, on which a simplicial triangulation 
T, of mesh size h is imposed. We assume that the vertices 
of T, belong to the closure of Q. A typical triangle in 7, 
near ðN is depicted in Figure 20(a). 

Let V, C Hi (@,) be the P, finite element space associ- 
ated with 7,. The approximate solution 4, € V, for (5) is 
then defined by 


a, (u,v) = Fp (v) Vue V, (157) 


(a) (b) 


Figure 20. Triangulations for curved domains. 


where 


a= f Vw - Vv dx Yw, v e H(2,) (158) 
Qa 


and F,(v) represented the result of applying a numerical 
quadrature scheme to the integral Sos Fv dx (cf. (143) with 
f and T replaced by f and J,). Here f is an extension of 
f to R?. We assume that the numerical scheme uses only 
the values of f at the nodes of J, and hence the discrete 
problem (157) is independent of the extension f- 

We assume that f € W} (9) for 2 <q < œo and hence 
u E€ w3 (Q) by elliptic regularity. Let u € W3 R?) be an 
extension of u such that 


lälwg < Collellwacay < Calf lim o (159) 


We Gan then take F = —Aŭ e W} (R?) to be the extension 
appearing in the definition of F}. 

The error Z—u, over 9, can be estimated by the 
following analog of (135): 


ii — < inf |Z — vla, + su 
lË — ulla, Sint | lan a. Toia, 
(160) 


The first term on the right-hand side is dominated by 
lä — Ma õlla, where TI, is the nodal interpolation operator. 
The second term is controlled by 


Jap (ñ — up, v)| = 


[ fudx — F,(v) 
Qh 
£ CAL Fimen llUll toy (161) 


which is a special case of (148), provided that the conditions 
(145) and (146) (with n = 1) on the numerical quadrature 
scheme are satisfied. 

Combining (43) and (159)-(161) we see that 


jü — urla = lä — urila, S Chil fll WUA) (162) 


that is, the P, finite element method for the curved domain 
retains the optimal O(h) accuracy. 

The approximation of Q, to & can be improved if we 
replace straight-edge triangles by triangles with a curved 
edge (Figure 20(b)). This can be achieved by the isopara- 
metric finite element methods. We will illustrate the idea 
using the P, Lagrange element. 

Let Q,, an approximation of £2, be the union of straight- 
edge triangles (in the interior of &2,) and triangles with 
one curved edge (at the boundary of 9,), which form a 
triangulation J, of ,,. The finite element associated with 
the interior triangles is the standard P, Lagrange element. 
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For a triangle D at the boundary, we assume that there is 
a homeomorphism ©, from the standard simplex $ onto 
D such that ®p(%) = (Pp, (4), Èp) where Èp, 18) 
and ® p 2(£) are quadratic polynomials i in £ = (x), ê). The 
space of shape functions Pz is then defined by 


Pj= [we C¥(D): vop € PAS) (163) 


that is, the functions in Py are quadratic polynomials in the 
curvilinear coordinates on D induced by ®5'. The set Ng 
of nodal variables consist of pointwise evaluations of the 
shape functions at the nodes corresponding to the nodes of 
the P, element on S (cf. Figures 2 and 20) under the map 
®,. We assume that all such nodes belong to Q and the 
nodes on the curved edge of D belong to 3N (cf. Figure 20). 

In other words, the finite element (©, Pp N5) i is pulled 
back to the P, Lagrange finite element on Š under ®p. 
It is called an isoparametric element because the compo- 
nents of the parameterization map ®p belong to the shape 
functions of the P, element on Š. The corresponding finite 
element space defined by (32) (with 7 replaced by 7;,) is a 
subspace of H Q) that contains all the continuous piece- 
wise linear polynomials with respect to 7,. By setting the 
nodal values on 9&, to be zero, we have a finite element 
space V, C Hi(,). The discrete problem for u, € V, is 
then defined by 


G, (Uz, V) = F,(v) (164) 


where the numerical quadrature scheme in the definition of 
F, involves only the nodes of the finite element space so 
that the discrete problem is independent of the choice of the 
extension of f and the variational form 4, (-, +) is obtained 
from a,(-,-) by the numerical quadrature scheme. * 

We assume that f € W}(Q) for 1 < q < œ and hence 
u e W$ (Q) by elliptic regularity, (assuming that & has a C? 
bòina). Let ñ € WR?) be an extension of u such that 


V2 ll waa S = Call#llwscay = = Call f lwz (165) 
Under the condition (153), we have 
YveV, (166) 


G,(v, v) = civla 


and the error of i — u, over Q, can be estimated by the 
following analog of (155) 


li — urlao) S ant Jä — vl mo) 


sup 
veVy\{0} Vl aia, 


k f &, (i, v) — a, (%, v) 


+ sup eae (167) 


veVy\(0} Waa) 


The analysis of the terms on the right-hand side of (167) 
involves the shape regularity of a curved triangle, which 
can be defined as follows. Let Ap be the affine map that 
agrees with p at the vertices of the standard simplex S, 
and D be the image of S under Ap. (B is the triangle in 
Figure 20(a), while D is the curved triangle in (b).) The 
shape regularity of the curved triangle D is measured by 
the aspect ratio y(D) (cf. (29)) of the straight-edged triangle 
D and the parameter k(D) defined by 
K(D) = max {A7"®p 0 AD Iwi) A lEn o Ap iwo] 

(168) 
and we can take the aspect ratio y(D) to be the maximum 
of y(D) and «(D). Note that in the case where D=D the 
parameter k(D) = 0 and y(D) = yD). 

The first term on the right-hand side of (167) is domi- 
nated by ||# — II,il|,, where Ti, is the nodal interpolation 
operator. Note that, by using the Bramble~Hilbert lemma 
on S and scaling, we have the following generalization 
of (43): 


|i — M pälgp) < C(diam DPI) (169) 
where Tp is the element nodal interpolation operator and 


the constant C depends only on an upper bound of y(D), 
and hence 


4 - Mialan) S Ch? älg) (170) 
In order to analyze the third term on the right-hand side of 
(167), we take f = —Ai and impose the conditions (145) 


and (146) (with n = 2) on the numerical quadrature scheme. 
We then have the following special case of (148): 


ja, (i, v) — åp (uy, v)| = Uf, fudx — F,(v) 


SCH fllyzealYlz@,) (71) 


Similarly, the second term on the right-hand side of 
(167), which measures the effect of numerical integration 
on the variational form a, (-, +), is controlled by the estimate 


la, (a, v) — a, (ñ, v)| < Ch’lälmop Plaen VY € Va 
(172) 
Combining (167) and (170)—(172) we have 


lä — tall £ CH UF lwo (173) 


where C depends only on an upper bound of {y(D): 
D6€T,} and the constants in (165). Therefore, the P, 
isoparametric finite element method retains the optimal 
O(h?) accuracy. On the other hand, if only straight-edged 
triangles are used in the construction of &,, then the 
accuracy of the P, Lagrange finite element method is only 
of order O(h3/2) (Strang and Berger, 1971). 

The discussion above can be generalized to higher-order 
isoparametric finite element methods, higher dimensions, 
and elliptic problems with variable coefficients (Ciarlet, 
1978). 


Remark 20. Estimates such as (160) and (162) are useful 
only when a sequence of domains Q,, with corresponding 
triangulations 7,, can be constructed so that h; | 0 and the 
aspect ratios of all the triangles (straight or curved) in the 
triangulations remain bounded. We refer the readers to Scott 
(1973) for 2-D constructions and to Lenoir (1986) for the 
3-D case. 


Remark 21. Other finite element methods for curved 
domains can be found in Zlámal (1973, 1974), Scott (1975), 
and Bernardi (1989). 


Remark 22. Let Q; be a sequence of convex polygons 
approaching the unit disc. The displacement of the simply 
supported plate on Q; with unit loading does not converge 
to the displacement of the simply supported plate on the unit 
disc (also with unit loading) as i > oo. This is known as 
Babu&ka’s plate paradox (Babuška and Pitkäranta, 1990). It 
shows that numerical solutions obtained by approximating 
a curved domain with polygonal domains, in general, do not 
converge to the solution of a fourth-order problem defined 
on the curved domain. We refer the readers to Mansfield 
(1978) for the construction of finite element spaces that are 
subspaces of H 2(Q). 


7.5  Pointwise estimates 


Besides the estimates in L,-based Sobolev spaces dis- 
cussed in Section 4, there also exist a priori error esti- 
mates for finite element methods in L,-based Sobolev 
spaces with p Æ 2. In particular, error estimates in the L,- 
based Sobolev spaces can provide pointwise error estimates. 
Below we describe some results for second-order ellip- 
tic boundary value problems with homogeneous Dirichlet 
boundary conditions. 

In the one-dimensional case (Wheeler, 1973) where Q 
is an interval, the finite element solution uy for a given 
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triangulation T with mesh size h satisfies 
iu — urle < ChHulwscoy (174) 


provided the solution u of (5) belongs to W% (9) and the 
finite element space contains all the piecewise polynomial 
functions of degree < n — 1. The estimate (174) also holds 
in higher dimensions (Douglas, Dupont and Wheeler, 1974) 
in the case where is a product of intervals and uz is the 
solution in the Q,,_, finite element space of Example 7. 

For a two-dimensional convex polygonal domain (Nat- 
terer, 1975; Scott, 1976, Nitsche, 1977), the estimate (174) 
holds in the case where n > 3 and uzis the P,_, triangular 
finite element solution for a general triangulation 7; In the 
case where uz is the P, finite element solution, (174) is 
replaced by 


le — urle £ Chi nhl l4lwac (175) 


Lœ estimates for general triangulations on polygonal 
domains with reentrant corners and higher-dimensional 
domains can be found in Schatz and Wahibin (1978, 
1979, 1982) and Schatz (1998). The estimate (175) 
was also established in Gastaldi and Nochetto (1987) 
for the Crouzeix-Raviart nonconforming P, element of 
Example 16. 

It is also known (Rannacher and Scott, 1982; Brenner 
and Scott, 2002) that 


lu = urlwaia) SC inf Clu — vgs cay (176) 


where & is a convex polygonal domain in R? and uz is 
the P, (n > 1) triangular finite element solution obtained 
from a general triangulation Tof Q. Optimal order estimates 
for |u — Uzlwicey can be derived immediately from (176). 
Extension of (176) to higher dimensions can be found in 
Schatz and Wahlbin (1995). 


7.6 Interior estimates and pollution effects 


Let Q be the L-shaped polygon in Figure 1. The solution 
u of the Poisson problem (5) on Q with homogeneous 
Dirichlet boundary condition is singular near the reentrant 
comer and u ¢ H?(Q). Consequently, the error estimate 
lu — urlo £ CArllfllz,@) does not hold for the P, 
triangular finite element solution u passociated with a quasi- 
uniform triangulation T of mesh size Ay 

However, u does belong to H7(Q;) where Q, is the 
subset of the points of Q whose distances to the reentrant 
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corner are strictly greater than the positive number 8. 
Therefore, it is possible that 


lu ~ Url pray £ Chrllf lina (177) 


That the estimate (177) indeed holds is a consequence of 
the following interior estimate (Nitsche and Schatz, 1974): 


lu=urlaoy SC inf u — Vi picasa) + lu — Url escayay) 
(178) 
where VC Hi(Q) is the P) triangular finite element 
space. Interior estimates in various Sobolev norms can be 
established for subdomains of general Q in R? and general 
finite elements. We refer the readers to Wahlbin (1991) for a 
survey of such results and to Schatz (2000) for some recent 
developments. 
On the other hand, since uy is obtained by solving 
a global system that involves the nodal values near the 
reentrant corner of the L-shaped domain, the effect of the 
singularity at the reentrant corner can propagate into other 
parts of Q. This is known as the pollution effect and is 
reflected, for example, by the following estimate (Wahlbin, 
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1984): 


lu — Ulla) = Ch (179) 


where B = x/(3m/2) = 2/3. Similar estimates can also be 
established for other Sobolev norms. 


7.7 Superconvergence 


Let uz be the finite element solution of a second-order 
elliptic boundary value problem. Suppose that the space 
of shape functions on each element contains all the polyno- 
mials of degree < n but not all the polynomials of degree 
n + 1. Then the L,, norm of the error u — u is at most of 
order A”+!, even if the solution u is smooth. However, the 
absolute value of u — uy at certain points can be of order 
h"+1+0 for some o > 0. This is known as the phenomenon 
of superconvergence and such points are the superconver- 
gence points for uy. Similarly, a point where the absolute 
value of a derivative of u — uy is of order h”*? is a super- 
convergence point for the derivative of uz. 

The division points of a partition J for a two point 
boundary value problem with smooth coefficients provides 
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Figure 21. Picture of a triangulation T=conv{(—1/2, —1), (—1, 1/2), (—1, —1)}, conv{(—1, -1/2 1/2, -—1 1 

conv{(1/2, 1), (1, 1/2), (1, 1)} with m = 24 triangles and n = 21 nodes (a), The Een i a a Ls of fess 
in circles) and elements (numbers in boxes) given in the matrices coordinates (b) and elements (c). The Dirichlet boundary 
conditions on the exterior nodes are included in the vector dirichlet = (1, 2, 3,4, 5,6,7, 8,9, 10, 11, 12, 13, 14, 15 16) of the 
labels in a counterclockwise enumeration. The data coordinates, elements, and dixichlet are the input of the finite element 


program to compute a displacement vector x as its output. 
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the simplest example of superconvergence points. Let uy 
be the finite element solution from the P, Lagrange finite 
element space. Since the Green’s function G, associated 
with a division point p is continuous in the interval Q 
and smooth on the two subintervals divided by p, we have 
(Douglas and Dupont, 1974) 


| —ur)(p)| = ja (u — ur, G,)| 
= |a(u — uz, G, - TZG,)\ 
< Clu — uriin lG, — OF Glao < Ch” 


provided that u is sufficiently smooth. Therefore, p is a 
superconvergence point for uy if n > 2. 

For general superconvergence results in various 
dimensions, we refer the readers to Křížek and 
Neittaanmäki (1987), Chen and Huang (1995), Wahlbin 
(1995), Lin and Yan (1996), Schatz, Sloan and Wahlbin 
(1996), Křížek, Neittaanmäki and Stenberg (1998), Chen 
(1999), and Babuška and Strouboulis (2001). 


7.8 Finite element program in 15 lines of 
MATLAB 


It is the purpose of this section to introduce a short (two- 
dimensional) P, finite element program. 

The data for a given triangulation T= {7}... ., Ta} 
into triangles with a set of nodes N= {z,,...,Z,} are 
described in user-specified matrices called coordinates 
and elements. Figure 21 displays a triangulation with 
m triangles and n nodes as well as a fixed enumeration 
and the corresponding data, The coordinates of the nodes 
Zk = (Xp, Yg) (d real components in general) are stored in 
the kth row of the two-dimensional matrix coordinates. 
Each element T; = conv{z,, Zes Zm} is represented by 
the labels of its vertices (k,£,m) stored in the jth 
row of the two-dimensional matrix elements. The 
chosen permutation of (k,£,m) describes the element 
in a counterclockwise orientation. Homogeneous Dirichlet 
conditions are prescribed on the boundary specified by an 
input vector dirichlet of all fixed nodes at the outer 
boundary; cf. Figure 21. 

Given the aforementioned data in the model Dirichlet 
problem with right-hand side f = 1, the P) finite ele- 
ment space Vx span{@, : z; € K} is formed by the nodal 
basis functions 9; of each free node z,; the set K of 
free nodes, the interior nodes, is represented in the N 
vector freenodes, the vector of labels in 1 : n without 
dirichlet. 

The resulting discrete equation is the N x N linear 
system of equations Ax =b with the positive definite 
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symmetric stiffness matrix A and right-hand side b. Their 
components are defined (as a subset of) 


Ags | Vo Vads 
bj = f fod for j,k=1,..., 
2 


The computation of the entries Aj, and b; is performed 
elementwise for the additivity of the integral and since Tis 
a partition of the domain Q. Given the triangle T, num- 
ber j, the MATLAB command coordinates (ele- 
ments(j,:),:) returns the 3 x 2 matrix (P,, Py, P3)? of 
its vertices. Then, the local stiffness matrix reads 


STIMA(T; ag = f Voy: Voede for a, B = 1, 2,3 
f 


for those numbers k and £ of two vertices z, = P, and z; = 
P, of T;. The correspondence of global and local indices, 
ie. the numbers of vertices in (Zp, Zg, Zm) = (Py, Po: P3)» 
of T; can be formalized by 


I(T) = {(@, K) € {1, 2,3} x {1,..., 7} : Pa = Zy EM 


The local stiffness matrix is in fact 


P. Faa a dg Aad 
STIMA(T) = det (007) with P =| p P, tA 


This formula allows a compact programming in MATLAB 
as shown (for any dimension d) 


function stima=stima (vertices) 
P=[ones (1, size(vertices,2)+1);vertices’]; 
Q=2\ [zeros (1,size(vertices,2));... 
eye (size (vertices, 2))]; 
stima=det (P)*Q#Q’ /prod(1:size(vertices,2)); 


Utilizing the index sets J, the assembling of all local 
stiffness matrices reads 


STIMA =), 
TjeT(a,kel (Tj) OEIT) 


STIMA(T ap €k 8 €g 


(e, is the kth canonical unit vector with the £th component 
equal to the Kronecker delta öpp and ® is the dyadic 
product.) The implementation of each summation is realized 
by adding STIMA(T;) to the 3 x 3 submatrix of the rows 
and columns corresponding to k, £, m; see the MATLAB 
program below. 
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function x=FEM(coordinates,elements,dirichlet) 
A=sparse (size (coordinates, 1) ,size(coordinates,1)); 
b=sparse (size (coordinates,1),1);x=zeros (size (coordinates,1),1); 


for j=1:size(elements,1) 


A(elements(j,:),elements(j,:) )=A(elements(j,:),elements(j,:)).-. 
+stima (coordinates (elements (j,:),:))i 
b(elements (j,:))=b(elements (j, :)}}+ones (3,1)... 
xdet([1,1,1;coordinates (elements (j,:),:)'])/6; 
end 


freeNodes=setdiff (1:size(coordinates,1),dirichlet); 
x (£freeNodes) =A(freeNodes, freeNodes) \b(freeNodes) ; 


Given the output vector x, a plot of the discrete 
solution 


n 
i= Sy Pr 
j=l 


is generated by the command trisurf (elements, Co- 
ordinates (:,1),coordinates(:,2),x) and displayed 
in Figure 22. 

For alternative programs with numerical examples and 
full documentation, the interested readers are referred 
to Alberty, Carstensen and Funken (1999) and Alberty 
et al. (2002). The closest more commercial finite element 
package might be FEMLAB. The internet provides over 
200000 entries under the search for ‘Finite Element 
Method Program’. Amongst public domain software are 
the programs ALBERT (Freiburg), DEAL (Heidelberg), UG 
(Heidelberg), and so on. 


Figure 22. Discrete solution of —Au = 1 with homogeneous 
Dirichlet boundary data based on the triangulation of Figure 21. 
A color version of this image is available at http:/Awww.mrw. 
interscience.wiley.com/ecm 
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1 INTRODUCTION 


The p-version of the finite element method (FEM) is 
presented as a method for obtaining approximate solutions 
to generalized formulations of the form 


‘Find u € X such that B(u, v) = F(v) for all v € Y’ (1) 


where u and v are scalar or vector functions in one, two, or 
three dimensions. In the displacement formulation of solid 
mechanics problems, for example, u is the displacement 
function, X is the space of admissible displacement func- 
tions, v is the virtual displacement function, Y is the space 
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of admissible virtual displacement functions, B(u, v) is the 
virtual work of internal stresses, and F(v) is the virtual 
work of external forces. 

More generally, u (resp. X) is called the trial function 
(resp. trial space) and v (resp. Y) is called the test function 
(resp. test space), B(u, v) is a bilinear form defined on X x 
Y and F(v) is a linear functional defined on Y. Associated 
with the spaces X and Y are the norms || - || y and || - ly. The 
definitive properties of bilinear forms and linear functionals 
are listed, for example, in Szabó and Babuška (1991), 
Schwab (1998), and Babuška and Strouboulis (2001). 

The definitions for B(u, v), F(v), X, and Y depend on 
the choice of the generalized formulation and the boundary 
conditions. The solution domain will be denoted by Q and 
the set of functions u that satisfy the condition B(u, u) < 
C < œ on Q will be called the energy space and denoted 
by E(Q). The exact solution will be denoted by upg. The 
energy norm defined by 


lule = $B, u) 2) 


will be associated with the spaces X C E(Q) and Y C 
E(Q). It can be shown that this formulation is equiva- 
lent to the minimization of the potential energy functional 
defined by 


Th(u) := 4B(u, u) — Flu) (3) 


The exact solution upy of equation (1) is the minimizer of 
Ti (u) on the space X C E(Q). 

In the finite element method, finite dimensional sub- 
spaces S C X and V CY are constructed. These spaces 
are characterized by the finite element mesh, the polyno- 
mial degrees assigned to the elements, and the mapping 
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functions. Details are given in Section 2. An approxima- 
tion to ugy, denoted by upg, is obtained by solving the 
finite dimensional problem: 


‘Find upg € S such that B{upg, v) = F(v) for all v € V’ 
(4) 
The dimension of V is the number of degrees of freedom, 
denoted by N. 
A key theorem states that the finite element solution u pp 
minimizes the error in energy norm 


ex ~ urelle = min luzy — ul Eo 5) 


It is seen that the error in energy norm depends on the 
choice of S. Proper choice of S depends on the regularity 
of ugy, the objectives of computation, and the desired level 
of precision. 

Another important theorem establishes the following 
relationship between the error measured in energy norm 
and the potential energy: 


ax — urrelili = Mure) — egy) (6) 


Proofs are available in Szabó and Babuška (1991). In the 
p-version, this theorem is used in a posteriori estimation of 
error in energy norm, 

The data of interest are [functionals of upg: Y; (gy), 
W,(ugy),--. approximated by WV) (upg), V2(Upp),-..- An 
important objective of finite element computations is to 
establish that thé relative errors in the data of interest are 
small. Therefore, it is necessary to show that 

IY; (ugg) — Vj Upp) < IYU) PHL, Zun LD 
where qt; are prescribed tolerances. Of course, Y, (ugy) 
is generally unknown; however, W; (upy) is known to be 
independent of the choice of the space S. Therefore, if 
we compute a sequence of finite element solutions corre- 
sponding to a hierarchic sequence of finite element spaces 
Sı CS, C Sy-++ then W;(Upz) > V;(ugy). The limiting 
value of W,(u,,) and hence t; can be estimated. The p- 
version of the finite element method is well suited for the 
creation of hierarchic finite element spaces and hence the 
estimation and control of errors in terms of the data of 
interest. 


2 IMPLEMENTATION 


From the theoretical point of view, the quality of approxi- 
mation is completely determined by the finite element space 
characterized by the finite element mesh A, the polynomial 


degrees of elements p, and the mapping functions Q (see 
Section 2.4). Specifically, the finite element space S is a 
set of functions constructed from polynomials defined on 
standard elements that are mapped onto the elements of the 
finite element mesh, subject to the appropriate continuity 
requirements to ensure that it is a subset of the energy space 


S := {ulu € EQ), u(Q*) € S™, k=1,2,...,M(A)} 


where Q* is the mapping function for the kth element, S” 
is the polynomial space of degree p, associated with the 
kth element, and M(A) is the number of elements, Differ- 
ent sets of basis functions, called shape functions, can be 
chosen to define the same finite element space; however, 
there are some important considerations: 


1. For a wide range of mapping parameters, the round-off 
error accumulation with respect to increasing polyno- 
mial degree should be as small as possible. (Ideally, 
the element-level stiffness matrices should be perfectly 
diagonal, but it is neither necessary nor practical to 
choose the shape functions in that way in two and three 
dimensions.) 

2. The shape functions should permit computation of the 
stiffness matrices and load vectors as efficiently as 
possible. 

3. The shape functions should permit efficient enforce- 
ment of exact and minimal continuity. 

4. The choice of the shape functions affects the per- 
formance of iterative solution procedures. For large 
problems, this can be the dominant consideration. 


The first three points suggest that shape functions should 
be constructed from polynomial functions that have certain 
orthogonality properties; should be hierarchic, that is, the 
set of shape functions of polynomial degree p should be in 
the set of shape functions of polynomial degree p + 1, and 
the number of shape functions that do not vanish at vertices, 
edges, and faces should be the smallest possible. Some of 
the shape functions used in various implementations of the 
p-version are described in the following. 


2.1 Hierarchic shape functions for 
one-dimensional problems 


The classical finite element nodal basis functions in one 
dimension on the standard element Q, = (—1, 1) are illus- 
trated on the left-hand side of Figure 1. 

The standard shape functions are defined by the set of 
Lagrange polynomials 


(8) 
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Figure 1. Set of one-dimensional standard and hierarchic shape 
functions for p = 1,2,3. (Reproduced by permission of John 
Wiley & Sons, Lid from A. Diister, H. Bröker and E. Rank, Int. 
J. Numer. Meth. Eng., 52, 673—703 (2001).) 


The points Ẹ; where 


0 aie 7 
NPG) =8y= {3 ifi=j (9) 
are called nodes. There are certain advantages in selecting 
the nodes to be the Gauss—Lobatto points as done in the 
spectral element method, which is also addressed in this 
encyclopedia (see Chapter 3, Volume 3). This approach 
has been modified to suit the p-version of the finite element 
method in Melenk, Gerdes and Schwab (2001). Note that 
the sum of all Lagrange polynomials for a given polynomial 
degree p equals unity: 


pti 
YN? @) =1 (10) 
ist + 


Every function that can be represented as a linear combi- 
nation of this standard basis can be represented also by the 
set of hierarchic basis functions (see the right-hand side 
of Figure 1). A principal difference between the two bases 
is that in the hierarchic case all lower-order shape func- 
tions are contained in the higher-order basis. The set of 
one-dimensional hierarchic shape functions, introduced by 
Szabó and Babuška (1991), is given by 


NE) = $(1— 8) an) 
N6) =} +8) a2) 
NO =a) t=34...p+1 03 
with 
2j-1 f? 
AO) = (= f toa 
ai bh Jao: OH 


V 


where L,(§) are the Legendre polynomials. The first two 
shape functions N, (E), N(g) are called nodal shape func- 
tions or nodal modes. Because 


N(-D=N,)=0, 123 (15) 
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p=2 — 


p=3 ———— 


Figure 2. Hierarchic structure of stiffness matrix and load vector 
with p = 3. (Reproduced by permission of John Wiley & Sons, 
Ltd from E, Stein (Editor), Error-controlled Adaptive Finite Ele- 
ments in Solid Mechanics, 263-307 (2002).) 


the functions N,(&),i = 3,4,... are called internal shape 
functions, internal modes, or bubble modes. The orthogo- 
nality property of Legendre polynomials implies 


1 . aN, 
Os D leah i>3 and j>1 
-1 dg d 
or i=l and jz3 (16) 


If equations are ordered in such a way that all linear 
modes are numbered from 1 to z}, all quadratic modes are 
numbered from n; +1 to nz and so on, stiffness matrices 
corresponding to polynomial order 1 to p — 1 are subma- 
trices of the stiffness matrix corresponding to polynomial 
order p. Figure 2 depicts the structure of a stiffness matrix 
and a load vector corresponding to polynomial degree of 
p = 3 schematically. 


2.2 Hierarchie shape functions for quadrilaterals 


The standard quadrilateral finite element is shown in 
Figure 4. Two types of standard polynomial spaces, the 
trunk space SRP (QA) and the tensor product space 
SIEP (ON), are discussed in the following. The tensor 
product space Sk '™ (QJ) consists of all polynomials on 
Q4 = [(—1, 1) x (—1, 1)] spanned by the set of monomials 
gin? where i =0,1,..., Per J =9,1,.--5 Py whereas the 
trunk space S2*?"(Q3) on Q4 = [(-1.1) x (-1, D] is 
spanned by the set of all monomials 


e în with i=0,.. 
0) 23% max{p,., Prl 

e &m for p, = p, = 1 

e Pn for pp = 2 

e nP for p, = 2. 


+> Der J=% Py i+j= 
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SOY 


Figure 3. The trunk space S$° (Q4) and the tensor product space 
Sk? (QA). (Reproduced by permission of John Wiley & Sons, Ltd 
from E. Stein (Editor), Error-controlled Adaptive Finite Elements 
in Solid Mechanics, 263-307 (2002).) 


The difference between the two standard polynomial 
spaces can be readily visualized when considering the 
spanning sets in Pascal’s triangle. The set of monomials for 
Ps = Py = 3 for both the trunk and the tensor product space 
is shown in Figure 3. All monomials inside the dashed 
line span the trunk space S° (Q4), whereas the monomials 
bordered by the solid line are essential for the tensor product 
space S?? (23). 

Two-dimensional shape functions can be classified into 
three groups: nodal, edge, and intemal shape functions. 
Using the numbering convention shown in Figure 4, these 
shape functions are described in the following. 


Vertex/side number 
1 2 3 4 
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Figure 4. Standard quadrilateral element Q4: definition of nodes, 
edges, and polynomial degree. 


1. Nodal or vertex modes: The nodal modes 


NYG. m= 40 +50 nm), 
i=1,...,4 (17) 


are the standard bilinear shape functions, well known 
from the isoparametric four-noded quadrilateral ele- 
ment. (Ẹ;, n;) denote the local coordinates of the ith 
node. 

Edge or side modes: These modes are defined 
separately for each individual edge, they vanish at all 
other edges. The corresponding modes for edge E) 
read: 


NEM =30—n4,6), i22 (18) 
3. Internal modes: The internal modes 


NUE D =O, F722 A9 
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Figure 5. Hierarchic shape functions for quadrilateral elements. Trunk space, p = 1 to p = 8. (From Finite Element Analysis, B. Szabó 


and I. Babuška; Copyright (1991) John Wiley & Sons, Inc. This material is used by permission of John Wiley & Sons, Inc.) 


are purely local and vanish at the edges of the 
quadrilateral element. 


As already indicated, the indices i, j of the shape functions 
denote the polynomial degrees in the local directions &, n. 
In Figure 5, all hierarchic shape functions that span the 
trunk space are plotted up to order p = 8. 


2.3 Hierarchic shape functions for hexahedrals 


The implementation of high-order finite elements in three 
dimensions can be based on a hexahedral element formula- 
tion (see Figure 6), again using the hierarchic shape func- 
tions introduced by Szabó and Babuška (1991). High-order 
hexahedral elements are suited for solid, ‘thick’ structures 
and for thin-walled structures alike. In the case of plate- or 
shell-like structures, one local variable can be identified to 
correspond with the thickness direction and it is possible 
to choose the polynomial degree in the thickness direction 
differently from those in the in-plane direction; see Diister, 
Bröker and Rank (2001). Generalizing the two-dimensional 
concept, three-dimensional shape functions can be classified 
into four groups: 


1. Nodal or vertex modes: The nodal modes 


1 
NB 16.1.0) = gU HEDA + amd +0), 
i=1,...,8 (20) 


are the standard trilinear shape functions, well known 
from the isoparametric eight-noded brick element. 
(&;, N;,¢,) are the local coordinates of the ith node. 

2. Edge modes: These modes are defined separately for 
each edge. If we consider, for example, edge E, (see 
Figure 6), the corresponding edge modes read: 


NEn O=- -E 222 
(21) 
3. Face modes: These modes are defined separately for 
each individual face. If we consider, for example, face 
F,, the corresponding face modes read: 


NF (8.0.8) = 30 -96,6)O,), 4722 
(22) 
4. Internal modes: The internal modes 


NE, 0) = AE mA), 17, 22 
(23) 
are purely local and vanish at the faces of the hexahe- 
dral element. 
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Figure 6. Standard hexahedral element ©: definition of nodes, 
edges, faces, and polynomial degree. 


The indices i, j,k of the shape functions denote the 
polynomial degrees in the local directions &, n, ¢. 

Three different types of trial spaces can be readily 
defined: the trunk space SRP (Qb), the tensor prod- 
uct space SbY?""*(Qh), and an anisotropic tensor product 
space S?-P-7(Q). A detailed description of these trial 
spaces can be found in Szabó and Babuška (1991). The 
polynomial degree for the trial spaces SR 7" (QE) and 
SREP (QB) can be varied separately in each local direc- 
tion (see Figure 6). Differences between the trunk and 
product spaces occur in the face modes and internal modes 
only. For explanation, we first consider the face modes, 
for example, the modes for face 1. Indices i, j denote the 
polynomial degrees of the face modes in § and y direction, 
respectively. 

Face modes (face Fy): Nf Œ, 1,9) = 1/20. -9A 6) 
$j m) 


trunk space tensor product space 
i=2,..., p2 i=2,..., Py 
Jh aa he rent 


i+j=4,...,max{p,, p,} 

The definition of the set of internal modes is very similar. 
Indices i, j,k now denote the polynomial degrees in the 
three local directions £, n, and ¢. 

Internal modes: N; ,(&, n, 5) = A Ep (mo, (6) 


trunk space tensor product 


space 
i=2,...,pp—4 i=2,..., Dg 
j =2,..., P74 j=2,... Py 
k=2,...,P.-4 k= 2,..., Pr 


i+jtk=6....,max{p,, Dy, Pr} 
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The space SP:?-4 (QÈ) defines an anisotropic set of shape 
functions determined by two polynomial degrees p and q 
(see Figure 6). All shape functions of higher order in Ẹ 
and n direction are associated with the polynomial degree 
p. These shape functions correspond to the edges 1, 2, 3, 
4, 9, 10, 11, 12, to the faces 1 and 6 and to all internal 
modes. Shape functions for faces 1 and 6 are equal to 
those of the trunk space Si "™(Q}) with ps = py = P- 
q defines the degree of all shape functions of higher order 
in ¢-direction that are associated with the edges 5, 6, 7, 
8, with the faces 2, 3, 4, 5, and with all internal modes. 
The modes corresponding to the faces 2, 3, 4, 5, are equal 
to those of the tensor product space Shea (QB) with 
P = på = p, and q = p,. Considering a polynomial degree 
P = = Py = Py = Py One observes that the number of 
intemal modes of S?-?4(Q%) is larger than that of the 
trunk space Sk?” (QB) but smaller than that of the tensor 
product space Spe?" (QE). 

Owing to the built-in anisotropic behavior of the trial 
space SP'P:4 (QÈ), it is important to consider the orientation 
of the local coordinates of a hexahedral element. Figure 7 
shows how hexahedral elements should be oriented when 
three-dimensional, thin-walled structures are discretized. 
The local coordinate ¢ of the hexahedral element corre- 
sponds to the thickness direction. If the orientation of all 
elements is the same then it is possible to construct dis- 
cretizations where the polynomial degree for the in-plane 
and thickness directions of thin-walled structures can be 
treated differently. 


2.4 Mapping 


In low-order finite element analysis (FEA), the most fre- 
quently used mapping technique for the geometric descrip- 
tion of the domain of computation is the application of 
isoparametric elements where the standard shape functions 
are used for the geometric description of elements, The 
same shape functions are used for the approximation of 
the unknown solution and for the shape of the elements. 


Figure 7. Modelling thin-walled structures with hexahedral 
elements. 


Using elements of order p = 1 or p = 2, the boundary of 
the domain is approximated by a polygonal or by a piece- 
wise parabolic curve, respectively. As the mesh is refined, 
the boundary of the domain is approximated more and more 
accurately. When using the p-version, on the other hand, the 
mesh remains fixed. It is therefore important to model the 
geometry of the structure accurately with the fixed number 
of elements. This calls for a method that is able to describe 
complex geometries using only a few elements, Gordon 
and Hall (1973a,b) proposed the blending function method 
that is usually applied when describing curved boundaries 
of p-version finite elements; see, for example, Szabó and 
Babuška (1991) and Düster, Bröker and Rank (2001). After 
introducing blending function mapping, an example will 
compare polynomial interpolation versus exact blending 
mapping and demonstrate the necessity of a precise descrip- 
tion of geometry. 


2.4.1 The blending function method 


Consider a quadrilateral element as shown in Figure 8 
where edge E, is assumed to be part of a curved bound- 
ary. The shape of edge E, is assumed to be defined by 
a parametric function E, = [E,(n), Eyl, where n is 
the local coordinate of the element. The transformation 
of the local coordinates § = [E, n]! into the global coor- 
dinates x = [x, y] = Q° = [Q$ G, n), Q$, n)]™ can be 
formulated by the two functions 


4 
x= 06 = > NNG, nX; 
i=l 


is 1 1 
+ (Extn ( x, + <,,)) +: 
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Figure 8. Blending function method for quadrilateral elements. 


where the first term corresponds to the standard bilinear 
mapping that is familiar from the isoparametric concept 
for quadrilateral elements with p= 1. The second term 
takes the curved edge E, into account. Therefore, the 
bilinear mapping is augmented by the blended difference 
between the curve E, = [E,(n), Ezy (n)]" and the straight 
line connecting the nodes N, and N;. The blending term 
(1 + &)/2 ensures that the opposite edge E, — where (1 + 
£)/2 = 0 — is not affected by the curvilinear description of 
edge E,. 

If a quadrilateral m which all edges are curved is to be 
considered, the blending function method can be expanded 
such that the mapping reads 


x= OE. 1) = za —ME,,6) + la + £) Enx (n) 


1 
+50 DEO + 50 ~ Bae (n) 


A 
- aie DX; 


i=l 
t 
y = O66) = 50 -DEE +30 +E 


1 
+ 50+ WE E) + 50 ~ DE x 


4 
- ONG. WY; (25) 


i=l 
where é 
E,,(§), Eyy(6), for i=1,3 
Ep0) E), for i=2,4 (26) 


are parametric functions describing the shape of the edges 
E,,i = 1, 2,3, 4. Therefore the blending function method 
allows arbitrary parametric descriptions of the edges of 
elements. 


2.4.2 Accuracy of mapping versus polynomial 
interpolation 


The following numerical example demonstrates the impor- 
tance of accurate representation of geometry when a p- 
extension is to be applied in order to find a finite element 
approximation. A quarter of a linear elastic square plate 
with a central circular hole and unit thickness (1 mm) 
is loaded by a traction T, = 100 MPa (see Figure 9). 
The dimensions are chosen to be b = A = 100 mm and 
R=10mm. At the lower and right side of the plate, 
symmetry conditions are imposed. The isotropic linear elas- 
tic material behavior is characterized by Young’s modulus 
E = 206900 MPa, Poisson’s ratio v = 0.29, and plane 
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Figure 9. Perforated square plate under uniform tension. 


stress assumptions. The strain energy of the plate — obtained 
by an ‘overkill’ finite element approximation — amounts to 
247.521396 Nmm. The plate is discretized by four quadri- 
lateral elements and the circle with radius R = 10 mm is 
represented by applying 


1. exact blending: that is, the exact parametric descrip- 
tion of a circle is applied; 

2. parabolic description: two parabolas are used to 
interpolate the circle with a corresponding relative error 
(|R — R\/R)100(%) < 0.0725(%), where R denotes 
the radius of the interpolated circle. 


A p-extension based on the tensor product space Sps” (28), 
p=1,...,8 is performed and the relative error in energy 
norm for both the exact blending and the parabolic bound- 
ary interpolation is plotted versus the degrees of freedom 
on a log-log scale in Figure 10. Owing to the smooth- 
ness of the exact solution of the problem, the p-extension 
in conjunction with the exact blending shows exponential 
rate of convergence (see equation (29) in Section 3.2). In 
the case of the parabolic boundary interpolation, the con- 
vergence rate of the p-extension deteriorates for p>3 
and the strain energy finally converges to an incorrect 
value. Consider the stresses, for instance, stress compo- 
nent o,, at point 2; we observe that the p-extension with 
p=1,..., 20 and exact blending converges rapidly while 
the stress obtained with parabolic boundary interpolation 
diverges (see Figure 11). i 
Although the relative error of the parabolic geometric 
interpolation seems to be very small, it has a strong 
influence on the accuracy of the p-extension. The strain 
energy of the approximation converges to an incorrect 
value and the stress component o,, at point 2 even 
diverges. The reason for this is that an artificial stress 
singularity is introduced. Considering the first derivatives 
at the interelement node 6, and at symmetry nodes 2 
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Figure 10. Influence of the blending on the relative error in 
energy norm. 
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Figure 11. Influence of the blending on the stress component oyy 
at point 2. 


and 3, discontinuities are observed. They lead to stress 
singularities similar to stress concentrations at corners. One 
way of avoiding these stress singularities is to use the 
exact blending or to apply the so-called quasi-regional 
mapping described in Kirélyfalvi and Szabó (1997). The 
idea of the quasi-regional mapping is to combine the 
blending function method with a polynomial interpolation 
of geometry, using optimal collocation points; see Chen 
and Babuska (1995, 1996). An example of the effectiveness 
of this kind of mapping is given in connection with a 
geometrically nonlinear problem in Section 5.2. A detailed 
comparison of exact and polynomial blending is given by 
Bröker (2001). 


3 CONVERGENCE CHARACTERISTICS 


In this section, some key theoretical resnlts that establish 
relationships between the error in energy norm and the 


number of degrees of freedom associated with hierarchic 
sequences of finite element spaces: S; CS, C -++ are 
presented. 

In the early implementations of the finite element me- 
thod, the polynomial degrees were restricted to p = 1 or 
p =2 only. Finite element spaces were enlarged by mesh 
refinement, that is, by reducing the diameter of the largest 
element, denoted by h. Subsequently, this limitation was 
removed, allowing enlargement of finite element spaces by 
increasing the polynomial degree of elements, denoted by 
p, while keeping the mesh fixed. To distinguish between 
the two approaches, the terms ‘h-version’ and ‘p-version’ 
gained currency. We will consider three strategies for 
constructing finite element spaces: 


(a) h-Extension: The polynomial degree of elements is 
fixed, typically at some low number, such as p £1 
or p = 2, and the number of elements is increased 
such that / is progressively reduced. 

(b) p-Extension: The mesh is fixed and the polynomial 
degree of elements is increased. 

(c) Ap-Extension: The mesh is refined and the polynomial 
degrees of elements are concurrently increased. 


A fourth strategy, not considered here, introduces basis 
functions, other than the mapped polynomial basis functions 
described in Section 2, to represent some local character- 
istics of the exact solution. This is variously known as the 
space enrichment method, partition of unity method, and 
meshless method. 

It is of considerable practical interest to know how the 
first space S, should be constructed and when and how 
h-extension, p-extension, or hp-extension should be used. 
The underlying principles and practical considerations are 
summarized in the following. 


3.1 Classification 


It is useful to establish a simple classification for the exact 
solution based on a priori information available concerning 
its regularity. The exact solution, denoted by #gy in the 
following, may be a scalar function or a vector function. 


Category A: Ugy is analytic everywhere on the solution 
domain including the boundaries. By definition, a 
function is analytic in a point if it can be expanded 
into a Taylor series about that point. The solution 
is in category A also when analytical continuation is 
applicable. 

Category B: ugy is analytic everywhere on the solution 
domain including the boundaries, with the exception 
of a finite number of points (or in 3D, a finite num- 
ber of points and edges). The locations where the 
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exact solution is not analytic are called singular points 
or singular edges. The great majority of practical 
problems in solid mechanics belong in this category. 
Problems in category B are characterized by piecewise 
analytic data, that is, the domain is bounded by piece- 
wise analytic functions and/or the boundary conditions 
are piecewise analytic. 

Category C: ugy is neither in category A nor in cate- 
gory B. 


At corner singularities and at intersections of material 
interfaces in two-dimensional problems, the exact solution 
typically can be written in the form 


co 
Uy = DANE), 7 < 0, Xnin>O 2D 
i=l 


where r, 9 are polar coordinates centered on the singular 
point, A;, 4, are real numbers, F, is an analytic (or 
piecewise analytic) vector function, and p is the radius of 
convergence. Additional details can be found in Grisvard 
(1985). This is known as an asymptotic expansion of the 
solution in the neighborhood of a singular point. Analogous 
expressions can be written for one and three dimensions 
with Xin > 1—d/2 where d is the number of spatial 
dimensions. The minimum value of ^; corresponding to 
a nonzero coefficient A, characterizes the regularity (also 
called ‘smoothness’) of the exact solution. In the following 
section, the key theorems concerning the asymptotic rates 
of convergence of the various extension processes are 
summarized. 


3.2 A priori estimates 


A priori estimates of the rates of convergence are available 
for solutions in categories A, B, and C. Convergence is 
either algebraic or exponential. The algebraic estimate is of 
the form 


k 
lex — Ure lea) = We (28) 


and the exponential estimate is of the form 


k 
Mx — Urgel e < GNS (29) 


These estimates should be understood to mean that there 
exists some positive constant k, and a positive constant B 
(resp. y and 6) that depend on Ugy, such that the error will 
be bounded by the algebraic (resp, exponential) estimate as 
the number of degrees of freedom N is increased. These 
estimates are sharp for sufficiently large N. 


The asymptotic rates of convergence for two-dimensional 
problems are summarized in Table 1 and for three-dimen- 
sional problems in Table 2. In these tables, p (resp. A) 
represents the minimum polynomial degree assigned to 
the elements of a finite element mesh (resp. A,;, in 
equation (27)) (see Chapter 4, this Volume). 


3.3 The choice of finite element spaces 


The theoretical results described in Section 3.2 provide an 
important conceptual framework for the construction of 
finite element spaces (see Chapter 3, this Volume). 


3.3.1 Problems in category A 


Referring to Tables 1 and 2, it is seen that for problems in 
category A, exponential rates of convergence are possible 
through p- and hp-extensions. These convergence rates can 
be realized provided that all singular points lie on element 
vertices and edges. For both the p- and Ap-extensions, the 
optimal mesh consists of the smallest number of elements 
required to partition the solution domain into triangular 


Table 1. Asymptotic rates of convergence in two dimensions. 


Category Type of extension 
h P hp 

A Algebraic Exponential Exponential 
B= p/2 8 > 1/2 6>1/2 
Algebraic Note 1 Algebraic Exponential 
B = (1/2)min(p,») B= 821/3 

Cc Algebraic Algebraic Note 2 
B>0 B>0 


Note 1: Uniform or quasi-uniform mesh refinement is assumed. In the 
case of optimal mesh refinement, Bmax = p/2. 

Note 2: When ugy has a recognizable structure, then it is possible 
to achieve faster than algebraic rates of convergence with hp-adaptive 
methods. 


Table 2. Asymptotic rates of convergence in three dimensions. 


Category Type of extension 
h P hp 

A Algebraic Exponential Exponential 
B = p/3 8> 1/3 @>1/3 

B Note 3 Exponential 

OS 1/5 

C Algebraic Algebraic Note 2 

p>0 p>0O 


Note 3: In three dimensions, ugy cannot be characterized by a single 
parameter. Nevertheless, the rate of p-convergence is at least twice the 
rate of h-convergence. 
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Figure 12. Example of a geometric mesh (detail). 


and quadrilateral elements in two dimensions; tetrahedral, 
pentahedral, and hexahedral elements in three dimensions. 
When -extensions are used, the optimal rate of conver- 
gence in 2D is algebraic with B = p/2. The optimal mesh 
grading depends on both p and the exact solution. 


3.3.2 Problems in category B 


When the exact solution can be written in the form of 
equation (27), there is an optimal design of the discretiza- 
tion in the neighborhood of the singular point. The finite 
elements should be laid out so that the sizes of elements 
decrease in geometric progression toward the singular point 
(located at x = 0) and the polynomial degrees of elements 
increase away from the singular point. The optimal grad- 
ing is q = (V2 — 1)? =~ 0.17 that is independent of X pin- 
In practice, q = 0.15 is used. These are called geometric 
meshes. An example of a geometric mesh in two dimensions 
is given in Figure 12. 

The ideal distribution of polynomial degrees is that the 
lowest polynomial degree is associated with the small- 
est element and the polynomial degrees increase linearly 
away from the singular points. This is because the errors 
in the vicinity of singular points depend primarily on 
the size of elements, whereas errors associated with ele- 
ments farther from singular points, where the solution is 
smooth, depend mainly on the polynomial degree of ele- 
ments. In practice, uniform p-distribution is used, which 
yields very nearly optimal results in the sense that con- 
vergence is exponential, and the work penalty associated 
with using uniform polynomial degree distribution is not 
substantial. 


3.4 A simple 1D model problem 


In this section, we will consider an axially loaded linear 
elastic bar as depicted in Figure 13. 

Although the solution of the underlying simple model 
problem (30)-(32) can be stated in a closed form, it 


Figure 13. Linear elastic bar. 


is worth studying because it implies many of the fea- 
tures that also appear in more complex models. Fur- 
thermore, the general concept of the p-version can be 
readily represented when considering the simple one- 
dimensional model problem. A tutorial program for a one- 
dimensional h-, p-, and Ap-version of the finite element 
method, where the following problems can be investi- 
gated in detail, has been implemented in Maple [1]. The 
solution u(x)(length) of the ordinary differential equa- 
tion (30) describes the displacement of the bar in x- 
direction, being loaded by a traction f (x) (force/length) and 
a load F (force). E (force/length”) denotes Young’s modu- 
lus, A(length”) the cross-sectional area, and L(length) the 
length of the bar. 


—(EAu'(x)y = f(x) on Q=[x]0<x<L] (0) 
u=0 at x=0 (31) 
EA“ =F at x=L (32) 


For the sake of simplicity, it is assumed that the dis- 
placement u(x) and strain € = du/dx are small and that 
the bar exhibits a linear elastic stress—strain relation- 
ship, that is, o = Ee with o being uniformly distributed 
over the cross-sectional area A. Equation (31) defines 
a Dirichlet boundary condition at x= 0 and equation 
(32), a Neumann boundary condition at x = L. For a 
detailed study of this model problem, see Szabó and 
Babuška (1991). The variational or weak formulation 
of the model problem (30)~(32), which is, the basis 
for a finite element approximation can be stated as 
follows: 

Find u € X satisfying (homogeneous) Dirichlet boundary 
conditions, such that 


Buu, v) = Fv) forall ve Y (33) 
_ where 


L 
Biu, v) =f EAu'v’ dx 
0 


iE 
and F(v) = Í fvdx + Fu(L) (34) 
0 


3.4.1 A numerical example with a smooth solution 


Figure 13 shows an elastic bar where it is assumed 
that EA = L =1, f(x) = —sin(8x) and F = 0, The p- 
version discretizations consist of one element with p = 
1, 2,3,...,8, whereas the A-version is based on a uni- 
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p=1 p=2 p=3 p=4 
p=5 p=6 p=7 p=8 


formly refined mesh with up to eight linear (p = 1) 
elements. 

First, we will consider the p-version discretization. The 
exact solution upy(x) = —(1/64)sin(8x) + (1/8)cos(8)x 
of the problem (33)—(34) is approximated by a polynomial 
expression on the basis of the hierarchic shape functions 
(11)—(13) 


Pmax 
upp (&) = Ny (GU, + N,6)U3 + D> Np Cap (35) 
p=2 


where Prax = 8. U; and U, denote the nodal displacements, 
whereas a3,..., ag are coefficients determining the higher- 
order terms of the approximation upg). Owing to 
the orthonormality property (16) of the higher-order 
shape functions, the element stiffness matrix, K if = (2/L) 
J}, EAGAN, (&)/d8) (AN, (€)/d8) d, i, j = 1,2, 3,...,9, is 
almost perfectly diagonal: 


1-100 +. 0 
-1 100+. 0 

a) & OB Rien 

Bs) y 02 ex 0 66) 
O Oo haw 2 


Computing the element load vector, Ff = (L/2) ype N, (8) 
f(x(&)) dé, i = 1,2,3,...,9, one finds 


T= [~0.1095, —0.0336, —0.0269, —0.0714, 0.0811, 
0.0433, —0.0230, —0.0073, 0.0026)" (37) 


Because of the homogenous Dirichlet boundary condition 
(u(0) = upg (0) = 0 > U, = 0), the solution of the result- 
ing diagonal equation system is trivial in this case. In 
Figure 14, the p-version approximation upg(x) for p = 
1,2,3,...,8 is plotted together with the exact solution 
of the problem. For a first comparison of the accuracy, 
the same problem is solved by applying the A-version with 
p =1 based on a uniformly refined mesh with decreasing 
element size h; = 1/i,i = 1,...,8. Again, the approxima- 
tion and the exact solution is drawn (see Figure 15). 
In Figure 16, the relative error in energy norm 


lex — “relle (38) 
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Figure 14. p-version solution urg (x) based on one element with 
D Var 0 8. A color version of this image is available at 
http://www.mrw.interscience.wiley.com/ecm 
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Figure 15. h-version solution upg(x) based on a uniform refined 
mesh with p = 1. A color version of this image is available at 
http://www.mrw.interscience.wiley.com/ecm 


is plotted versus the number of degrees of freedom in 
a double logarithmic style. By the classification given in 
Section 3.2, this problem is in category A, where the p- 
version exhibits exponential convergence (29), whereas 
the asymptotic rate of convergence of the -extension is 
algebraic (28). For category A problems in one dimension, 
the parameter $ in equation (28) is B = p. Since in this case 
p = 1, the asymptotic rate of convergence is 1, as shown 
in Figure 16. 
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Figure 16. Comparison of the A- and p-version: relative error in 
energy norm. 
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3.4.2 A numerical example with a nonsmooth 
solution 


In the following example, we will again consider the weak 
formulation (33)-(34) of the model problem (30)-(32) 
where f(x) is now chosen such that the exact solution is 
nonsmooth. We define f(x) = \(. — 1)x*"?, F =0 and 
EA = L = 1, resulting in an exact solution Upg = —x* + 
Xx, where is the parameter controlling the smoothness 
of the solution. If à < 1.0, then the first derivative of the 
exact solution will exhibit a singularity at x = 0 and the 
given problem will be in category B. Note that à > 1/2 is 
a necessary condition for obtaining a finite strain energy of 
the exact solution. For the following numerical example, 4 
is chosen to be 0.65. 

In Figure 17, the relative error in energy norm (38) is 
plotted versus the number of degrees of freedom on a 
log-log scale. p-Extension was performed on one ele- 
ment with p=1,...,50, whereas the h-extension was 
performed on meshes with equal sized elements 4 = 
1,..., 1/50 with p = 1. Since the given problem is in cate- 
gory B, both extensions show algebraic convergence of type 
(28). The asymptotic rate of convergence of the h-extension 
is given by 


B = min (p, 4 — 4) = 0.15 (39) 


and can be clearly observed in Figure 17. The rate of 
convergence of the uniform p-extension is twice the rate 
of the uniform A-extension. This is due to the fact that the 
point where the exact solution exhibits singular behavior 
coincides with a node. 

When combining mesh refinement with an increase 
in polynomial degree, exponential convergence in energy 
norm (29) can be achieved with an Ap-extension, even when 
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Figure 17. Comparison of the h- and p-version: relative error in 
energy norm. 


the exact solution upy has singularities. The mesh is refined 
towards the singular points by geometric progression using 
the common factor q = 0.15. The location of the nodal 
points X; is given by 


0 for i=0 
aye {rahe for i =1,2,..., na (40) 
A polynomial degree pai, = 1 is assigned to the element 
at the singularity, and increases linearly away from the 
singular point to the maximum degree 


Pmax = (20 — 1) — 1) (41) 


where ) denotes the smoothness of the solution and ny, the 
total number of elements of the corresponding mesh. With 
this hp-extension, one obtains an exponential convergence 
in energy norm as shown in Figure 18 (hp-version, q = 
0.15, X = 0.65). Using about 100 degrees of freedom, the 
error is by several orders of magnitude smaller than that 
of a uniform p-version with one element or of a uniform 
h-version with p = 1. 

Figure 18 also shows the results of uniform p-extensions 
obtained on geometrically refined meshes with g = 0.15. 
These extensions are performed on meshes with ny = 
4, 8, 12, 16, 20, 24 elements with p being uniformly increa- 
sed from 1 to 8. In the preasymptotic range, the p- 
extension on fixed, geometrically graded meshes shows 
an exponential convergence rate. In the asymptotic range, 
the exponential convergence decreases to an algebraic rate, 
being limited by the smoothness of the exact solution. If 
proper meshes are used, that is, if the number of refinements 
corresponds to the polynomial degree, then any required 
accuracy is readily obtained. 
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Figure 18, Comparison of the h-, p-, and hp-version: relative 
error in energy norm. 


3.5 Model problem: The L-shaped domain 


In order to illustrate the convergence characteristics of the 
h-, p-, and hp-extensions for category B problems, we 
consider an L-shaped domain in two-dimensional elasticity, 
under the assumption of plane strain conditions using 
Poisson’s ratio 0.3. The reentrant edges are stress-free. In 
the xy coordinates system shown in Figure 12, the exact 
solution (up to rigid body displacement and rotation terms) 
corresponding to the first term of the asymptotic expansion 
is 
Atay 
uy = zg" [ík — Q^ + 1))cos 4,9 — ~ cos(d, — 2)8] 
(42) 
A 
uy = ae [l + O, 04 + 1)) sin h,6 + 2, sinh, — 2)8] 
(43) 
where G is the shear modulus, , = 0.544483737, Q; = 
0.543075579, and k = 1.8. The coefficient A, is called 
a generalized stress intensity factor. Details are available 


in Szabó and Babuška (1991). The corresponding stress 
components are 


o, = A MrH — QO + 1)) cos(, — 1)0 


— (hy — 1) cos(d, — 36] (44) 
oy = ArH + QO +D) cos(d, — 18 
+ (4 — Deos(a, — 3)6] (45) 
Ty = Ayr! Ty — DsinQh, — 3)0 
+ 0,0, + DsinQ, — DE] (46) 


This model problem is representative of an important class 
of problems. The reentrant edges are stress-free, the other 
boundaries are loaded by the tractions that correspond to 
the exact stress distribution given by equations (44) to 
(46). Since the exact solution is known, it is possible 
to compute the exact value of the potential energy from 
the definition of Ti (upy) given by equation (3) and using 
BUugy, Ugg) = F(Ugy) from equation (1): 


1 
(gy) = -3f luotu) + Uy (Th, + Oyny)] ds 


ay 
= —4,15454423 1 — (47) 


where n,, n, are the components of the unit normal to 
the boundary and a is the dimension shown in the inset in 
Figure 12. The convergence paths for k- and p-extensions 
are shown in Figure 19. 
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Figure 19. Convergence paths for the L-shaped domain. (From 
Finite-Element Analysis, B. Szabó and I. Babuška; Copyright 
(1991) John Wiley & Sons, Inc. This material is used by 
permission of John Wiley & Sons, Inc.) 


It is seen that the asymptotic rates of convergence are 
exactly as predicted by the estimate (28). However, when 
p-extension is used on a geometric mesh, the preasymptotic 
rate is exponential or nearly so. This can be explained by 
observing that the geometric mesh shown in Figure 12 is 
overrefined for low polynomial degrees, hence the dominant 
source of the error is that part of the domain where the exact 
solution is smooth and hence the rate of convergence is 
exponential, as predicted by the estimate (29). Convergence 
slows to the algebraic rate for small errors, where the 
dominant source of error is the immediate vicinity of the 
singular point. 

The error estimate frequently used in conjunction with 
p-extensions is based on the equation (6) and the use of 
Richardson extrapolation utilizing the a priori estimate (28). 
When hp-adaptivity has to be considered, local-based error 
estimators have to be applied; see, for example, Ainsworth 
and Oden (2000) and Melenk and Wohlmuth (2001). By 
definition, the effectivity index @ is the estimated error 
divided by the true error. The estimated and true errors and 
the effectivity indices are shown in Table 3. The parameter 
B is the same as that in equation (28). 


4 PERFORMANCE CHARACTERISTICS 


We have seen in Figure 19 that for a fixed accuracy (say 
1%) there is a very substantial reduction in the number 
of degrees of freedom when p-extension is performed on 
properly designed meshes. From a practical point of view, 
the important consideration is the cost of computational 
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Table 3. L-shaped domain. Geometric mesh, 18 elements, trunk 
space, Plane strain, v = 0.3. Estimated and true relative errors in 
energy norm and effectivity index 6. 


P N  W(ure)E p {e)z (%) 8 


Aja™\t,  Est’d True Est’d True 


41 —3.886332  — — 2542 25.41 1.00 
119 —4.124867 1.03 1.03 844 846 1.00 
209 —4.148121 137 136 3.91 3.93 0.99 
335 —4.152651 1.33 1.30 2.09 2.14 0.98 
—4.153636 0.99 094 142 148 0.96 

695 —4.153975 0.78 068 1.09 1.17 0.93 

929 —4.154139 0.69 0.60 089 0.99 0.89 
1199 —4,154238 0.69 056 0.75 0.86 0.87 


co —4.154470 0.54 0 


8 ONDUAN 
è 
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resources rather than the number of degrees of freedom. 
The proper basis for comparing the performance character- 
istics of various implementations of the h- and p-versions 
of the finite-element method is the cost of computation. 
The cost has to be evaluated with respect to representative 
model problems, such as the L-shaped domain problem dis- 
cussed in Section 3.5, given specific goals of computation, 
the required accuracies, and the requirement that a reason- 
ably close estimate of the accuracy of the computed data 
of interest be provided. It is essential that comparisons of 
performance include a verification process, that is, a pro- 
cess by which it is ascertained that the relative errors in 
the data of interest are within prespecified error tolerances. 
Verification is understood in relation to the exact solution 
of the mathematical model, not in relation to some physical 
reality that the model is supposed to represent. The conse- 
quences of wrong engineering decisions based on erroneous 
information usually far outweigh the costs of verification. 

Comparative performance characteristics of the h- and 
p-versions were first addressed in Babuska and Scapolla 
(1987) and Babuska and Elman (1989) through analyses of 
computational complexity and theoretical error estimates as 
well as computer timing of specific benchmark problems. 
The main conclusions are summarized as follows: 


1. Only for the uncommon cases of very low accuracy 
requirements and very irregular exact solutions are low- 
order elements preferable to high-order elements. High- 
order elements typically require smaller computational 
effort for the same level of accuracy. 

2. High-order elements are more robust than low-order 
elements. This point is discussed further in Section 4.1 
below. 

3. The most effective error control procedures combine 
proper mesh design coupled with progressive increase 
in p. For details, we refer to Rank and Babuska (1987), 
Babuska and Suri (1990), and Rank (1992). 


4. Accuracies normally required in engineering computa- 
tion can be achieved with elements of degree 8 or less 
for most practical problems. 

5. Computation of a sequence of solutions correspond- 
ing to a hierarchic sequence of finite element spaces 
S, C S C +++ provides for simple and effective estima- 
tion and control of error for all data of interest, based 
on various types of extrapolation and extraction pro- 
cedures; see, for example, Szabó and Babuska (1988), 
Szabó (1990), and Yosibash and Szabó (1994). 


As a general rule, for problems in categories A and B 
(defined in Section 3.1), which include the vast majority 
of practical problems in solid mechanics, p-extension on 
properly designed meshes is the most efficient general solu- 
tion strategy. The performance of p-extensions in solving 
problems on category C is discussed in Section 5.1.1. 

In the p-version, the elemental matrices are large and 
their computation is time-consuming. On the other hand, 
these operations lend themselves to parallel computation; 
see, for example, Rank et al. (2001). Furthermore, it has 
been shown that a substantial reduction in time can be 
achieved if special integration techniques are used (see 
Niibel, Diister and Rank, 2001), or if the hierarchic structure 
is sacrificed (see Melenk, Gerdes and Schwab, 2001). 


4.1 Robustness 


A numerical method is said to be robust when it performs 
well for a broad class of admissible data. For example, 
in the displacement formulation of linear elasticity, letting 
Poisson’s ratio v approach 1/2 causes the volumetric strain 
(div u) to approach zero. This introduces constraints among 
the variables, effectively reducing the number of degrees 
of freedom, and hence causing the rate of convergence in 
energy norm to decrease, in some cases, very substantially. 
This phenomenon is called locking. Locking also causes 
problems in the recovery of the first stress invariant 
from the finite element solution. A similar situation exists 
when the thickness approaches zero in plate models based 
on the Reissner formulation. For a precise definition of 
robustness, we refer to Babuska and Suri (1992). It was 
shown in Vogelius (1983) that the rate of convergence 
in p-extensions is not influenced by v — 1/2 on straight 
sided triangles. It is also known that the h-version using 
straight triangles does not exhibit locking, provided that 
p = 4. For curvilinear elements, the rate of p-convergence 
is slower, and for the h-version the locking problem is 
generally much more severe. Although the p-version is 
affected by membrane locking, in the range of typical plate 
and shell thicknesses that occur in practical engineering 
problems, locking effects are generally not substantial. For 
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an investigation of membrane locking in cylindrical shells, 
we refer to Pitkäranta (1992). 


4.2 Example 


The following example is representative of shell intersec- 
tion problems. Specifically, the intersection of two cylindri- 
cal shells is considered. Referring to Figure 20, the outside 
radius of shell A is R, = 140 mm, the outside radius of 
shell B is Rg = 70 mm. The wall thickness of shell A (resp. 
shell B) is t, = 8.5 mm; (resp. tg = 7.5 mm). The axes of 
the shells intersect at a = 65°. The length of shell A is 800 
mm, the length of shell B, measured from the point of inter- 
section of the axes of the shells, is 300 mm. The modulus 
of elasticity is E = 72.4 MPa, Poisson’s ratio is v = 0.3. 

The intersection of the outer surfaces of the shells 
is filleted by a ‘rolling ball fillet’, that is, the fillet 
surface is generated as if rolling a sphere of radius r; = 
10.0 mm along the intersection line. The mesh consists 
of 34 hexahedral elements. The shell intersection region, 
comprised of 16 elements, is the darker region shown in 
Figure 20, The complement is the shell region. Quasi- 
regional mapping utilizing 6 x 6 collocation points per 
curved face was employed. 

The inside surface is loaded by a pressure p. In order 
to establish equilibrium, a normal traction T, is applied on 
the surface Sp, which is the surface of intersection between 
shell B and a plane perpendicular to its axis: 
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The yz plane is a plane of symmetry. The other surfaces 
are traction-free. Appropriate rigid body constraints were 
imposed in order to prevent motion parallel to the płane of 
symmetry. 

The objective is to estimate the magnitude of the maximal 
von Mises stress to within 5% relative error. In the 
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Figure 20. Example: Shell intersection problem. The darker 
region, comprised of 16 elements, is the shell intersection region. 
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Figure 21. Example: Shell intersection problem. Convergence of 
the maximum von Mises stress normalized with respect to the 
applied pressure ñ. The estimated limit is 64.7, The maximum 
occurs in the shell intersection region. 


shell intersection region, the solution varies substantially 
over distances comparable to the thickness. Therefore, 
dimensional reduction cannot be justified for this region. 
Fully three-dimensional elements, that is, elements based 
on the trunk spaces S”? (Qh) with p = 1,2,...,8 were 
used in the shell intersection region, whereas the anisotropic 
spaces S?'?7(Qh) were used in the shell region. The 
computations were performed with StressCheck [2]. 

The results are shown in Figure 21. Very strong con- 
vergence to the estimated limit value of 64.7 is observed 
when the isotropic spaces SE”? (Qh) are employed. This 
is true also for the anisotropic spaces SZ” (Qh) for q = 2 
but not for q = 1. The reason is that g = 1 implies kine- 
matic assumptions similar to those of the Naghdi shell 
theory. This introduces an artificial singularity along the 
faces where q changes abruptly from 8 to 1. Essentially, this 
is a modeling error in the sense it pertains to the question of 
whether and where a particular shell model is applicable, 
given that the goal is to approximate some functional of 
the exact solution of the underlying fully three-dimensional 
problem. Some aspects of this problem have been addressed 
in Schwab (1996) and Actis, Szabó and Schwab (1999). 
This example illustrates the importance of convergence tests 
on the data of interest, including tests on the choice of 
dimensionally reduced models. 


5 APPLICATIONS TO NONLINEAR 
PROBLEMS 


5.1 Elastoplasticity 


The p- and hp-versions of the finite element method have 
been widely accepted as efficient, accurate, and flexible 
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methods for analyzing linear problems in computational 
mechanics. On the other hand, applications of the p- and 
hp-versions to nonlinear problems are relatively recent 
and hence less well known. Considering for instance, the 
J, flow theory of elastoplasticity, a loss of regularity is 
expected along the boundary of the plastic zone. Following 
the classification of Section 3.1, this problem is of Class C, 
that is, it has an unknown line (in 2D) or surface (in 3D) of 
singular behavior in the interior of the domain. Therefore, 
only an algebraic rate of convergence can be expected. 
However, this asymptotic rate does not give information 
on the preasymptotic behavior, that is, on the accuracy of a 
p-extension for a finite number of degrees of freedom, and 
especially on the question of computational investment for 
a desired accuracy of quantities of engineering interest. 

To shed some light on this question, we will investigate 
the deformation theory of plasticity, first proposed by 
Hencky (1924), as a very simple model problem for 
elastoplasticity. For a detailed description and numerical 
investigation of this model problem, see Szabó, Actis and 
Holzer (1995) and Diister and Rank (2001). We refer to 
Holzer and Yosibash (1996), Diister and Rank (2002), and 
Diister et al. (2002) for a study of the more complex and 
physically more realistic flow theory of plasticity, where 
each load integration step in an incremental analysis can be 
considered equivalent to the model problem investigated in 
the following section. 


5.1.1 A benchmark problem 


As a numerical example, we again use the structure of 
Figure 9 in Section 2.4.2 showing a quarter of a square 
plate with central hole and unit thickness, loaded now 
by a uniform tension of magnitude T, = 450 MPa. The 
dimensions of the plate are chosen to be b = h = 10 mm 
and the radius is set to R=1 mm. The material is 
now assumed to be elastic, perfectly plastic and plane 
strain conditions are assumed. The shear modulus is p = 
80193.8 MPa, the bulk modulus is k = 164206.0 MPa, 
and the yield stress is og = 450.0 MPa. This problem was 
defined by Stein (2002) as a benchmark problem for the 
German research project ‘Adaptive finite-element methods 
in applied mechanics’. 

To find an approximate solution for the given bench- 
mark, we use the p-version based on the tensor product 
space Sh” (Q4) taking advantage of the blending function 
method. Three different meshes with 2, 4 and 10 p-elements 
have been chosen (see Figure 22). A series of computa- 
tions for polynomial degrees p < 17 for the mesh with 2 
elements and p <9 for the meshes with 4 and 10 ele- 
ments was performed. In order to make a comparison with 
an adaptive h-version, we refer to the results of Barthold, 


Figure 22. Three meshes with 2, 4, and 10 p-elements. (Reprinted 
from Comput. Methods Appl. Mech, Engng., 190, A. Diister and 
E. Rank, The p-version of the finite-element method is compared 
to an adaptive h-version for the deformation theory of plasticity, 
1925-1935, Copyright (2001), with permission from Elsevier.) 


Schmidt and Stein (1997, 1998) and Stein et al. (1997). The 
computations there were performed with the Q1-PO ele- 
ment differing from the well known bilinear quadrilateral 
element by including an additional, elementwise constant 
pressure degree of freedom. A mesh consisting of 64 Q1- 
PO elements was refined in 10 steps using the equilibrium 
criterion, yielding 875 elements with 1816 degrees of free- 
dom (see Figure 23). In Barthold, Schmidt and Stein (1997, 
1998) and Stein etal. (1997), the results of a sequence 
of graded meshes and a reference solution obtained with 
24200 QI-PO elements with a corresponding number of 
49 062 degrees of freedom are also given. Comparing the 
results of the uniform p-version with those of the h-version 
based on a sequence of graded meshes, we observe that. 
the efficiency of the p-version is superior (see Figures 24, 
25). The discretization with 4 elements, p = 9, and 684 
degrees of freedom provides an accuracy that cannot be 
reached by the /-version, even when using 4096 Q1-P0 
elements with 8320 degrees of freedom. Even compared 
to an h-refinement, resulting in an adapted mesh with 875 
Q1-PO elements, it can be seen that a uniform p-version 
is much more accurate. Although the p-version is signifi- 
cantly more elaborate than the h-version, when comparing 
the computational effort per degree of freedom, investiga- 
tions on the computational cost to obtain highly accurate 
results have clearly shown a superiority of high-order ele- 
ments. For further information, including three-dimensional 
examples of the J, flow theory with nonlinear isotropic 
hardening, we refer to Diister and Rank (2001, 2002), 
Diister (2002), Diister et al. (2002), Rank et al. (2002), and 
Stein (2002). 


5.1.2 An industrial application 


The following example is concerned with a structural 
component of moderate complexity, called a dragbrace 
fitting, shown in Figure 26. This part is representative of 
structural components used in the aerospace sector in that 
relatively thin plate-like regions are reinforced by integrally 
machined stiffeners. The overall dimensions are length L = 
219.6 mm and width w = 115 mm. The material is typically 
aluminum or titanium, which exhibit strain hardening. For 
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Figure 23. Initial mesh with 64 Q1-PO elements and adapted 
mesh with 875 QI-PO elements (see Barthold, Schmidt and 
Stein, 1997). (Reprinted from Comput. Methods Appl. Mech. 
Engng., 190, A. Diister and E. Rank, The p-version of the 
finite-element method compared to an adaptive h-version for the 


deformation theory of plasticity, 1925-1935, Copyright (2001), . 


with permission from Elsevier.) 
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Figure 24. Displacement u, at node 4. (Reprinted from Comput. 
Methods Appl. Mech. Engng., 190, A. Diister and E. Rank, The 
p-version of the finite-element method compared to an adaptive 
h-version for the deformation theory of plasticity, 1925—1935, 
Copyright (2001), with permission from Elsevier.) 


the purposes of this example, an elastic—perfectly plastic 
material was chosen because it poses a more challenging 
problem from the numerical point of view. The material 
properties are those of an ASTM A-36 steel; the yield point 
is 248 MPa, the modulus of elasticity is E = 200 GPa, and 
Poisson’s ratio is v = 0.295, The mathematical model is 
based on the deformation theory of plasticity. 

The lugs A and B are fully constrained and sinusoidally 
distributed normal tractions are applied through lugs C 
and D. The resultants of the tractions are F and 2F 
respectively, acting in the negative x direction as the dark 
region shown schematically in Figure 26. The goal of the 
computation is to determine the extent of the plastic zone, 
given that F = 5.5 KN. The mesh consists of 2 tetrahedral 
elements, 22 pentahedral elements, and 182 hexahedral 
elements. 
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Figure 25. Displacement u, at node 5. (Reprinted from Comput. 
Methods Appl. Mech. Engng., 190, A. Diister and E. Rank, The 
p-version of the finite-element method compared to an adaptive 
h-version for the deformation theory of plasticity, 1925-1935, 
Copyright (2001), with permission from Elsevier.) 


Figure 26. Example: Dragbrace fitting. Elastic-plastic solution, 
p = 7, trunk space, N = 49 894. In the dark region, the equivalent 
strain exceeds the yield strain. 


The region of primary interest is the neighborhood of the 
loaded lugs. The results of linear analysis indicate that the 
maximal von Mises stress in this region is 1040 MPa, that 
is, 4.2 times the yield stress. Therefore, nonlinear analysis 
has to be performed. The region where the equivalent 
strain exceeds the yield strain is shown in Figure 26. The 
computations were performed with StressCheck. 


5.2 Geometric nonlinearity 
The following example illustrates an application of the 


p-version to a geometrically nonlinear problem. In geomet- 
rically nonlinear problems, equilibrium is satisfied in the 
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deformed configuration. The constitutive laws establish a 
relationship either between the Piola—Kirchhoff stress ten- 
sor and the Euler-Lagrange strain tensor or the Cauchy 
stress tensor and the Almansi strain tensor. The formula- 
tion in this example is based on the Cauchy stress and the 
Almansi strain; see Noël and Szabó (1997). The mapping 
functions given by equation (25) are updated iteratively by 
the displacement vector components. For example, at the 
ith iteration, the x-coordinate is mapped by 


x = OF, 0,8) + uP, n,e) (49) 


It is known that when a thin elastic strip is subjected to 
pure bending, it deforms so that the curvature is constant 
and proportional to the bending moment: 


(50) 


where p is the radius of curvature, M is the bending 
moment, E is the modulus of elasticity, and J is the moment 
of inertia. Poisson’s ratio v is zero. In this example, a thin 
strip of length L = 100 mm, thickness ż = 0.5 mm, and 
width b = 5 mn is subjected to normal tractions on Face A 
shown in Figure 27, which correspond to M chosen so that 
p= L/2x: 


20E _ 
Poe (51) 


L 
where y is measured from the mid surface in the direction 
of the normal in the current configuration. The three dis- 
placement vector components are set to zero on Face B. 
Three hexahedral elements were used. The anisotropic 
space Sk” (QÈ) described in Section 2.3 was used. Map- 
ping was by the blending function method using 6 x 6 
collocation points in the quasi-regional mapping procedure 
described by Kirdlyfalvi and Szabó (1997). The computa- 
tions were performed with StressCheck. The load 7, was 
applied in 20 equal increments. The final deformed con- 
figuration, a nearly perfect cylindrical body, is shown in 
Figure 27. The exact solution of a perfectly cylindrical mid- 
dle surface (the elastica) is the limiting case with respect 
to the thickness approaching zero. 

This example illustrates the following: (a) In the p- 
version, very large aspect ratios can be used. (b) Quasi- 
regional mapping, which is an extension of isoparametric 
mapping combined with the blending function method, is 
capable of providing a highly accurate representation of 
the geometrical description with very few elements over 
large deformations. In this example, Face A was rotated 
360 degrees relative to its reference position. 


Figure 27, Example: Thin elastic strip. Geometrically nonlinear 


sonon Three hexahedral elements, anisotropic space S2:*1 (ah 
= 684, at 


6 OUTLOOK 


Although implementations of the p-version are available in 
a number of commercial finite element computer codes, 
widespread applications of the p-version in professional 
practice have been limited by three factors: 


1. The infrastructure of the most widely used FEA soft- 
ware products was designed for the A-version, and 


cannot be readily adapted to meet the technical require- 
ments of the p-version. 


2. In typical industrial problems, finite element meshes are 


generated by automatic mesh generators that produce 
very large numbers of tetrahedral elements mapped 
by low-order (linear or quadratic) polynomial map- 
ping functions. When the mapping functions are of 
low degree, the use of high-order elements is gen- 
erally not justified. This point was illustrated in 
Section 2.4.2. Nevertheless, numerous computational 
experiments have shown that p-extension performed 
on tetrahedral meshes up to p = 4 or p = 5 provides 
efficient means of verification for the computed data 
when the mappings are proper, that is, the Jacobian 
determinant is positive over every element. Experi- 
ence has shown that many commercial mesh gener- 
ators produce improperly mapped elements. As mesh 
generators improve and produce fewer elements and 
more accurate mappings, this obstacle will be gradually 
removed. 


3. The demand for verified information in industrial appli- 


cations of FEMs has been generally weak; however, 
as computed information is becoming an increas- 
ingly important part of the engineering decision- 
making process, the demand for verified data, and 
hence the importance of the p-version, is likely to 
increase. 


soreness Ae Ae 


At present, the p-version is employed in industrial appli- 
cations mainly where it provides unique technical capa- 
bilities. Some examples are: (a) Analysis of mechanical 
and structural components comprised of plate- and shell- 
like regions where dimensional reduction is applicable, 
and solid regions where fully three-dimensional represen- 
tation is necessary. An example of this kind of domain 
is shown in Figure 26 where it would not be feasible 
to employ fully automatic mesh generators because the 
fillets would cause the creation of an excessive number 
of tetrahedral elements. On the other hand, if the fillets 
were omitted, then the stresses could not be determined in 
the most critical regions. (b) Two- and three-dimensional 
linear elastic fracture mechanics where p-extensions on 
geometric meshes, combined with advanced extraction pro- 
cedures, provide verified data very efficiently; see, for 
example, Szabó and Babuska (1988) and Andersson, Falk 
and Babuska (1990). (c) Plate and shell models where the 
robustness of the p-version and its ability to resolve bound- 
ary layer effects are important; see, for example, Babuska, 
Szabó and Actis (1992), Actis, Szabó and Schwab (1999), 
and Rank, Krause and Preusch (1998). (d) Analysis of 
structural components made of composite materials where 
special care must be exercised in choosing the mathemati- 
cal model; large aspect ratios must be used and geometric 
- as well as material nonlinear effects may have to be con- 
sidered; see Engelstad and Actis (2003). (e) Interpretation 
of experimental data where strict control of the errors 
of discretization (as well as the experimental errors) is 
essential for proper interpretation of the results of physical 
experiments, 

The p-version continues to be a subject of research aimed 
at broadening its application to new areas. Only a few 
of the many important recent and ongoing research activ- 
ities can be mentioned here. Application of the p- and 
hp-versions to mechanical contact is discussed in Paczelt 
and Szabó (2002) and the references listed therein. The 
problem of Ap-adaptivity was addressed in the papers 
Demkowicz, Oden and Rachowicz (1989), Oden ef al. 
(1989), Rachowicz, Oden and Demkowicz (1989), and 
Demkowicz, Rachowicz and Devloo (2002). The design 
of p-adaptive methods for elliptic problems was addressed 
in Bertéti and Szabó (1998). The problem of combin- 
ing p- and hp-methods with boundary element meth- 
ods (BEMs) for the solution of elastic scattering prob- 
lems is discussed in Demkowicz and Oden (1996). Fur- 
ther information on coupling of FEM and BEM can be 
found in this encyclopedia (see Chapter 13, this Vol- 
ume). Application of hp-adaptive methods to Maxwell 
equations was reported in Rachowicz and Demkowicz 
(2002). 
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NOTES 


[1] Waterloo Maple Inc., 57 Erb Street West, Waterloo, 
Ontarlo, Canada (www.maplesoft.com). The worksheet 
can be obtained from the Lehrstuhl fiir Bauinformatik, 
Technische Universitat München, Germany (www. inf. 
by.tum.de/~duester). 

[2] StressCheck is a trademark of Engineering Software 
Research and Development, Inc., St. Louis, Missouri, 
USA (www.esrd.com). 
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1 INTRODUCTION 


In the past three decades, spectral methods have evolved 
from their noble ancestor, the Fourier method based 
on trigonometric expansions, through the more flexible 
Galerkin method with Gaussian integration, all the way 
maintaining their most distinguished feature: the very high 
rate of convergence. 

They are numerical methods for solving boundary-value 
problems for partial differential equations. 

For the reader’s convenience, we will gradually approach 
this subject by first addressing the case of periodic prob- 
lems, where the so-called Fourier methods are used. Then 
we turn to nonperiodic problems and address colloca- 
tion approximations based on algebraic polynomial expan- 
sions. The different concepts are first explained on one- 
dimensional intervals. Then we address the case of a square 
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or a cube or a simplex, and finally the case of more com- 
plex geometrical domains. We illustrate the case of elliptic 
equations, Stokes and Navier-Stokes equations, and then 
advection equations and conservation laws. 


2 FOURIER METHODS 


In their early stage, spectral methods were designed to 
approximate the periodic solution of partial differential 
equations by a truncated Fourier series. If 


+00 1 r 
ue) = J yo), OO) = a 


=—00 


is the unknown solution, the numerical solution is sought 
in the form 


(N/2)—1 


Uy (x) = > Uy px) 


k=-N/2 


where N is an (even) integer that dictates the size of 
the approximate problem. Note that the unknowns are 
represented by the Fourier coefficients {uy ,}- Potentially, 
this approximation has a tremendously high quality, since 
for all 0 < k < s, there exists a positive constant C} , such 
that 


- km 
inf Ju —eylleto2m £ Crs N llao D 
uyESy p 


provided u belongs to the Sobolev space H; (0,2x) of 
periodic functions having s derivatives in L? (0, 27x). Here, 
Sy = spanfo,] — N/2 <k < (N/2) — i} denotes the space 
of the finite Fourier series of order N. This abstract 
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approximation property is reflected by a corresponding error 
estimate for the difference u — uy. Actually, in the most 
classical approach, the spectral Fourier method consists of 
approximating a given PDE, say 


Lu@) =f (x), xEQ (2) 


with Q = (0, 2x), where L is a differential operator and 
f is a given 2x-periodic function, by requiring the L?- 
projection of the residual upon the subspace Sy to vanish, 
that is, 
find uy E Sy St (Luy — f, 9) =9 Vo € Sy 

(3) 
Here, (v, w) = fo vw denotes the L?(Q) inner product. 
If L is a constant coefficient operator, this yields an 
embarrassingly simple problem. As a matter of fact, 
owing to the L?-orthogonality of the functions {9,}, that 
is, (Oks Om) = Šim Yk, m € Z, equation (3) yields, after 
Fourier-transforming the residual Luy — f, the following 
set of explicit equations for the unknowns: 


x N N 
yun = fe =p sks 371 (4) 


where fi = (f, p) is the kth Fourier coefficient of f, while 
A; is the kth eigenvalue of L. For instance, if 
L=-D(eD)+6D+y!I (with a, B, y constants) 
(5) 
where D denotes differentiation with respect to x and J the 
identity, then 4, = ak? +ipk + y. 

Moreover, in this special case, uy indeed coincides with 
the truncated Fourier series of order N of the exact solution 
u, thus the bound (1) (with vy = uy) provides an error 
estimate. 

However, the one that we have just described is an overly 
fortunate circumstance. Should indeed some of the coeffi- 
cients a, ĝ, or, y be functions of x (or, even worse, of 
u, yielding a nonlinear equation), then convolution sums 
between the unknown frequency coefficients {uy ,} and 
the Fourier coefficients of a, 8, y will arise, and the diago- 
nal structure of equation (4) would be lost. A variant of 
the projection approach (3) can be based on evaluating 
the convolution sums by discrete Fourier transform. This 
requires introducing equally spaced nodes, x; = nj/N, 
j=0,..., N — 1, then replacing the exact integrals in (3) 
by numerical integration; the resulting scheme is 

find u$, € Sy st. (Luy-fi@y=0 Yee Sy 
(6) 
where (v, w)y = (2n/N) Ero v(x,)ib(x,) is the Gaus- 
sian approximation of the scalar product (v, w). The exact- 
ness of the Gaussian approximation on Sy, namely, the 


property that (v, w)y = (v, w), Yv, w € Sy, enables us to 
recover from (6) a collocation formulation Lyth = f at 
all nodes x;, where Ly is obtained from L by replacing 
each derivative by the corresponding so-called pseudospec- 
tral derivative. This means that for any smooth function v, 
Dv is replaced by D(Iyv), where 


(N/2)—1 

Ive Sy, Iyv@)= D> yee), = U, py 
k=—N/2 

(7) 


is the interpolant of v at the nodes {x;}. 
The interpolation error satisfies, forO<k<s, s 2 1, 


k=. 
lv = Iyvlimo,2m S CN Iel Ho,2m) (8) 


and so does the collocation error u — u§,. A consequence 
of (8) (when k = 1) is that the error on the pseudospectral 
derivative |v’ — yv)'\|z2@,2n) decreases like a constant 
time N!~S, provided that v € H; (0, 2x) for some s > 1. 
Indeed, one can even prove that 


Dv — Dy Il2@2m SCONE? YO<n< no 


provided that v is analytic in the strip |Imz| < nọ. This 
exponential rate of convergence is often referred to as 
spectral convergence, as it is a distinguishing feature of 
spectral methods. 

There is, however, a major difference between the collo- 
cation approach and the L?-projection approach (3). In the 
latter, the unknowns are the frequency coefficients {u y ,} 
of uy, whereas in the collocation approach one looks for 
the nodal values {u; = un) of uġ. These values may 
be interpreted as the coefficients of uf, with respect to the 
trigonometric Lagrange basis associated with the nodes x,; 
indeed, observing that uf, = I,yu§,, using (7) and exchang- 
ing summations over k and j, one gets 


N=1 m (N/2)-1 N-1 
w(x) = Says DE Go) = Dla) 
j=0 k=-N/2 j=0 


where p; € Sy satisfies Y (Xm) =), OS j,m<N—1. 
A modal representation is used in the former case (Fourier), 
whereas a nodal one is adopted in the latter (collocation). 

The same approach can be pursued for boundary-value 
problems set on multidimensional intervals Q = (0, 2x)7, 
d=2,3 by tensorizing basis functions and collocation 
nodes. 

Fourier methods represent the most classical approach in 
spectral methods. The interested reader can find a compre- 
hensive coverage of the subject in the monographs Gottlieb 
and Orszag (1977) and Canuto et al.(1988). 
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3 ALGEBRAIC POLYNOMIAL 
EXPANSION 


When a boundary-value problem with nonperiodic data (of 
Dirichlet, Neumann, or mixed type) has to be solved numer- 
ically, the trigonometric expansion is no longer adequate to 
guarantee high order of accuracy. Then, Jacobi orthogonal 
polynomials are used to provide orthogonal bases for the 
approximation space. 

The finite dimensiorial space P y is now made of algebraic 
polynomials of degrees less than or equal to N. 

The historical approach, inspired by the Fourier method, 
aimed at expanding the approximate solution with respect 
to a basis of orthogonal polynomials 


N 
ty) = J uy Pr) (9) 


k=0 


where uy , now represent the unknown frequency coeffi- 
cients. 

The matter of choice were the Chebyshev polynomi- 
als, p,(x) = T;, (x) = cos(k6), 8 = cos (x), —1 <x < 1, 
owing to their analogy with trigonometric polynomials. 
Since the Chebyshev basis does not necessarily match 
the boundary requirement (as T,(1) = 1, 7,(-) = (-D*, 
Vk > 0), one device consists of projecting the equation 
residual on the reduced space Py_», enforcing the bound- 
ary conditions afterward. For instance, for a Dirichlet 
boundary-value problem like (2), where now Q = (—1, 1), 
and Dirichlet boundary conditions u(—1) = u_, u(+1) = 
u, the solution (9) is required to satisfy the following 
equations: 


(Luy -f,Tdo =0, OSk=N-2 
uy(~1) =u_, Uy (4) =u, (10) 


This modal approach was termed the Lanczos—Tau 
method. The symbol (u, v), = f°; uv œ dz is the so-called 
weighted scalar product with respect to the Chebyshev 
weight function w(x) = (1 —x?)7¥/?, -1 <x <1. The 
weighted scalar product is used, instead of the more tra- 
ditional one (-, +), in order to take advantage (to the highest 
possible extent) of the Chebyshev orthogonality, 


(TT og =0 if km 


(Tp, Ty) = 
Des To = = Vk = 1 


When L has constant coefficients, the Lanczos—Tau prob- 
lem (10) yields an algebraic system for the frequency coef- 
ficients {uy ,} with a structured matrix for which efficient 


diagonalization algorithms can be devised, a circumstance 
that is also featured by the multidimensional problems that 
are generated by tensorization. 

However, this is not general enough, as this structure 
gets lost for a more general kind of differential opera- 
tors. A more flexible approach (in analogy with what was 
done in the Fourier case) consists of adopting a nodal 
representation of uy at selected Gauss—Lobatto nodes 
x; =cos(xj/N), j =0,..., N, then looking for a stan- 
dard Galerkin approximation with integrals replaced by 
Gauss—Lobatto integration: 


N 
u, v)y = Do ayu(x,vx,) (11) 


j=0 


where a, =(x/N) for j=1,...,.N—1, a =ay = 
(x/2N) are the quadrature coefficients. 

Should we still consider the baby Dirichlet boundary- 
value problem for the operator Ł introduced in (5), the 
corresponding discrete problem would read: 


finduy E€ Py, uyli) =u, uy (1) =u}, S.t 
(uuy: Uy) + (Buy, Uv) y + (Yuy: Udy = (f, Un) ye 
Yuy € PS, (12) 


where now P9, = (vy € Pylvy(—L) = vy(1) = 0}. This 
time, however, the expansion is made in terms of the nodal 
Lagrangian basis at Gauss—Lobatto nodes, that is, using 
instead of (9) 


N 
uy (x) = upa) 
j=0 


where p; is the unique algebraic polynomial of degree N 
such that 4, (x;) = ði Vi, j =0,..., N- 
One may show that 


2) t 
W/O) =p a p20... 
(13) 

The same approximation framework can be set up by 
replacing the Chebyshev polynomials with the Legendre 
polynomials {L,, k = 0,1,...}, which are orthogonal with 
respect to the traditional L?-scalar product (otherwise said 
with respect to the weight function œ = 1). 

The approximate problem still reads like (12); however, 
this time the nodes {x,} and the coefficients {a,} are those 
of the (Legendre) Gauss—Lobatto integration. 

The approach described above is named G-NI (Galerkin 
with Numerical Integration). A similar G-NI approach can 
be undertaken in several dimensions. For instance, consider 
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a second-order elliptic boundary-value problem 


jee? inQ=(-1,14, d=2,3 


14 
u=0 on 822 a9 


together with its weak form 
findu e V = HUQ): alu, v) =(f,v) Vue V (15) 


The bilinear form a: V x V — R is associated with the 
operator L; for instance, if 
Lu = —div(aVu) + P- Vu + yu, with a > a > 0 
(16) 
then a(u, v) = fo (4Vu - Vv +B: Vuv + yuv). 

Upon introducing the tensorized Legendre—Gauss— 
Lobatto quadrature nodes and coefficients x, = (Xj,5 ees 
Xey) and a, = Apr eos Op,» (k; = 0,..., N), the Legendre— 
Galerkin approximation of (15) with numerical integration 
(G-ND becomes 


find uy € Vy = Pa: 
aylly, ty) =(fity)y Wye Vy 09 


where Py is now the set of polynomials of degree < N 
with respect to each of the independent variables, and P9, 
is its subspace made of those polynomials vanishing at 3N. 
Moreover, 


(u, vy = D> pu v) (18) 
k 


is the Gauss—Lobatto quadrature formula that approximates 
the scalar product (u,v), while ay is the discrete bilinear 
form that is obtained from a by replacing each scalar 
product (-,-) with (-, -)y. Owing to the property that the 
quadrature formula (18) has degree of exactness 2N — 1, 
the Galerkin numerical integrated problem (17) can still 
be interpreted as a collocation method. Indeed, it follows 
from (17) that Lyuy = f at all internal nodes x,, where 
Ly is the approximation of L obtained by replacing each 
exact derivative by the derivative of the interpolant Iy at 
the Gauss—Lobatto nodes. The interpolation operator Iy is 
defined as follows: Iyv(x,) = v(x,), Iyv € (Py), for all 
v € C°(Q). Then, the operator approximating (16) is 


Lyky = —div[Ty(@Vuy)] +$: Vuy +yuy 
Existence and uniqueness of the solution of (18) follow 
from the assumption that ay(-,-) is a uniformly coercive 


form on the space V x V, that is, 


Jo” > 0 independent of N s.t. 


ay (Uy, Vy) = aloni Voy © Vy 9 


This is the case for the problem at hand if, for example, 
B is constant and y is nonnegative. 

The convergence analysis of the G-NI approximation can 
be carried out by invoking the Strang Lemma for general- 
ized Galerkin approximation. Precisely, the following error 
estimate holds: 


M 
Ju uy < iaf ( +) u- wyl 


1 a(Wy, Uy) — Ay (Wy, V 
52> aig PONIO Galea le, 
A” yy eVir\(0} ley ll 


1 (f, Uy) - (f, Un) 
Tee canara i>) eae 
O* wyeVy\{0) Hoy ll 


where || - || is the norm of H! (Q) and M is the constant of 
continuity of the bilinear form a(., -). 
Three sources contribute to the approximation error: 


— the best approximation error, which can be immedi- 
ately bounded by taking wy = Iy_ju: 


inf jju — wyll < llu—Iy_yu 
inf, Mu- wyll < Iu — Iyul 


— the error on the numerical quadrature, which can be 
bounded as follows: 


sup (f, vy) — fun) y 
uv €Vy\{0} lvy] 


= C (F ~~ Ty f lizo) +I- Py- f lro) 


where Py f is the truncated Legendre series of f of 
order N — 1; 

— the error generated by the approximation of the bilin- 
ear form, on its hand, is less immediate to estimate. 
However, having chosen wy = Jy_,u, which is a 
polynomial of degree N — 1, using the degree of 
exactness of the quadrature formula and assuming that 
the coefficients of the operator are constant, one easily 
checks that a(wy, vy) — ay (Wwy, Vy) = 0, that is, this 
error is actually null. If the coefficients are noncon- 
stant, one can control it in terms of the interpolation 
error measured in H!(Q). 

We can conclude by taking advantage of the optimal- 
ity of the truncation error in the L?-norm and that of the 
interpolation error in both the L?- and H!- norm: 

a 
Vf € H"(Q), r= 0, |f- Py fllo = GNT | fllar@y 
Yg € H°(Q), s 21, Nilg — Ingliz + Il — Inglia 


1 
< CN Igla 


Thus, we obtain that 


Ju — uy] < Cs (N7 fll acy +N ullao) 


provided u and f have the requested regularity. 

A few comments on the implementation of the method 
are in order. The algebraic system associated with (17) 
reads Au = f, where a,; = ay (ly, Uv). f= AVp)y = 
(u;), u; = uy (3;), and {4;} denote the Lagrangian basis 
functions of S, associated with all the nodal points {x}. 
The matrix A, which is nonsingular whenever (19} is 
fulfilled, is ill conditioned: indeed, there exist two constants 
C,, C, such that 


CIN? < cond(A) = C,N* 


where cond(A) is the (spectral) condition number of A. 
The use of a preconditioned iterative procedure (e.g., the 
conjugate gradient when $ = 0, or a Krylov iteration oth- 
erwise) is mandatory. A possible preconditioner is given 
by the diagonal of A. This yields a preconditioned sys- 
tem whose condition number behaves like a constant times 
N2. A more drastic improvement would be achieved by 
taking as a preconditioner the matrix associated with the 
(piecewise-linear) finite element discretization of the oper- 
ator (16) at the same Legendre-~-Gauss—Lobatto nodes. 
This is an optimal preconditioner as the condition num- 
ber of the preconditioned system becomes independent 
of N. 

Spectral methods based on algebraic polynomials have 
been discussed and analyzed in Canuto ef al.(1988), 
Bernardi and Maday (1992), Bernardi and Maday (1997) 
and Guo (1998) (see Chapter 3, this Volume). 


4 ALGEBRAIC EXPANSIONS ON 
TRIANGLES 


Spectral methods for multidimensional problems rely their 
efficiency on the tensor product structure of the expansions 
they use. This feature naturally suggests the setting of 
the methods on patches of Cartesian products of intervals, 
such as squares or cubes, possibly after applying a smooth 
mapping. On the other hand, triangles, tetrahedra, prisms, 
and similar figures allow one to handle complex geometries 
in a more flexible way. So, a natural question arises: Can 
one match the advantages of a tensor product structure with 
those of a triangular geometry? 

A positive answer to this question was given by 
Dubiner (1991), who introduced the concepts of collapsed 
Cartesian coordinate systems and warped tensor products. 
The method was further developed by Karniadakis and 
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Sherwin (1999). We now describe this approach in 
2D, pointing to the latter reference for the 3D 
extensions. Let us introduce the reference triangle 
T= {@,%2) € R?:-1 < xj, X3; xı +X, < O}, as well 
as the reference square Q= {(§;,§) E R°: -1 <$, 
& < 1}. The mapping 


1+x 
Qx) > Erë), 8 = 25 = = 


E, = x3 (20) 


=1 


is a bijection between Tand Q. Its inverse is given by 


x, = 41 +8, — £p) 


x = Ey 


Œi 52) > Œi X2), 


Note that the mapping (x4, x2) > (1, 52) sends the ray 
in T issuing from the upper vertex (—1, 1) and passing 
through the point (x,,—1) into the vertical segment in 
Q of equation §, = x}. Consequently, the transformation 
becomes singular at the upper vertex, although it stays 
bounded therein. The Jacobian of the inverse transformation 
is given by [3 (%1, x2)/3&1, &)] = (1/2)(1 — 52). We term 
Œi E) the collapsed Cartesian coordinates of the point 
on the triangle whose regular Cartesian coordinates are 
(x1, Xp)- 

Denote by (pee ()} the family of Jacobi polynomials 
of increasing degree k > 0, which form an orthogonal 
system with respect to the measure (1 — £)*(1 + &)* dg in 
(—1, 1) (note that PO) is the Legendre polynomial 
L,,() introduced in the previous section). For k = (kų, k2), 
define the warped tensor product function on Q 


Pk E2) aE Pa E) Vi, E2) (21) 

where WE) = PE) 
Wy, ia Ga) = (1— ED" RTE) (22) 
which is a polynomial of degree k, in §; and k; +k, in 


£. By applying the mapping (20), one obtains the function 
defined on T 


l+x k 
ouaa = 4G.) = PEO (2522 - 1) 0 =" 
D E) (23) 


It is easily seen that », is a polynomial of global 
degree k, + k, in the variables x,, x2. Furthermore, owing 
to the orthogonality of Jacobi polynomials, one has for 
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k#h 
$ Dye (X ys XQ) Pp (X1 X2) dx, dx, 
1 1 
mp i PR 61) Py) d$; 


1 
$ L POE yp OME OEN -= §3)" t+ dg, = 0 


We conclude that the set {@,:0 < kj, ka; ky +k, < N} is 
an orthogonal basis of modal type of the space Py(T) 
of the polynomials of total degree < N in the variables 
X1, Xp. 

While orthogonality simplifies the structure of mass and 
stiffness matrices, it makes the enforcement of bound- 
ary conditions, or matching conditions between elements, 
uneasy. To overcome this difficulty, it is possible to modify 
the previous construction by building a new modal basis, 
say {g,}, made of boundary functions (3 vertex functions 
plus 3(N — 1) edge functions) and internal functions (bub- 
bles). Each basis function retains the same ‘warped tensor 
product’ structure as above. Indeed, it is enough to replace 
in one dimension the Jacobi basis pi) (&) (with a = 0 or 
2k + 1) with the modified basis given by the two boundary 
functions (1 + &)/2 and (1 —£)/2 and the N — 1 bubbles 
(1 +§)/2(1 —£)/2PP@), k =1,...,N — 1. These uni- 
variate functions are then combined as in (21) to form the 
two-dimensional basis. 

With such basis on hand, one can discretize a boundary- 
value problem by the Galerkin method with numerical 
integration (G-NI). To this end, one needs a high-precision 
quadrature formula on T Since 


[ fou dx, dx, 
T 


1 1 1 
= sf, dé, E F(E, €201 — &) dy, 


it is natural to use a tensor product Gaussian formula in 
Q for the measure dë, (1 — &3) dk. This is obtained by 
tensorizing a (N + 1)-point Gauss—Lobatto formula for the 
measure d&, with a N-point Gauss~Radau formula for 
the measure (1 — §)d&, with § = —1 as integration knot 
(excluding the singular point &, = 1 from the integration 
knots makes the construction of the matrices easier). The 
resulting formula is exact for all polynomials in Q of degree 
<2N — 1 in each variable ,, §; hence, in particular, it is 
exact for all polynomials in 7 of total degree < 2N — 1 
in the variables x,,x,. Note, however, that the number of 
quadrature nodes in Tis N(N + 1), whereas the dimension 
of Py(T) is (1/2)(N + 1)(N + 2); thus, no basis in Py (T) 
can be the Lagrange basis associated with the quadrature 


nodes. This means that the G-NI method based on the 
quadrature formula described above cannot be equivalent 
to a collocation method at the quadrature points. 

Finally, we observe that the G-NI mass and stiffness 
matrices on 7 can be efficiently built by exploiting the 
tensor product structure of both the basis functions and the 
quadrature points through the sum-factorization technique. 


5 STOKES AND NAVIER-STOKES 
EQUATIONS 


Spectral methods are very popular among the community 
of fluid-dynamicists. Owing to their excellent approxima- 
tion properties, spectral methods can, in fact, provide very 
accurate simulations of complex flow patterns. However, 
special care is needed for the treatment of the incompress- 
ibility constraint. With the aim of simplification, let us first 
address the linear Stokes equations 


—vAu+ gradp=f in Q(C R’, d=2,3) 
| divu=0 in Q (24) 
u=0 on dQ 
where v > 0 is the kinematic fluid viscosity, u the fluid 
velocity, p the fluid pressure, and f the volume forces. 
A natural spectral G-NI discretization reads as follows: 
find uy € Vy, Py € Qy such that 


= (¢, Vy))y 


| (WVuy, VYy))y — (Py div Vy) y 
—(Gy,divvy)y =0 


Van € Qy 


where (-, +) is the discrete Gauss—Lobatto scalar product 
(18), while ((-, -)), denotes its generalization to the case 
of vector functions. Moreover, Vy = (P9,)? while Oy is a 
polynomial space that needs to be chosen conveniently so 
as to satisfy the following Brezzi condition: 


y > 0: Vay € Qy, Avy € Vy s-t. Gy, div Vy) y 


= By lay liza lY lla) (26) 


The violation of this condition, that is, the existence of 
nonconstant pressures gy € Qy such that (gy, div Vy)y = 
0, Yvy € Vy, implies the existence of spurious pressure 
modes, which pollute the computed pressure py. 

The largest constant By, called the inf-sup constant, 
depends on the way Q y is chosen, and has a special role in 
the analysis of the spectral approximation (25). Two choices 
are commonly proposed in practice. The first one is Qy = 
Py-2N L2(Q), that is, the space of polynomials of degree 
N — 2 with zero average. In that case, By = CNO-®/?, 


Wy €Vy (025) 
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An alternative approach consists of choosing Oy = 
Phi UP y_2 N LACO), for some 4:0 < X < 1, where [AN] 
denotes the largest integer < XN; in this case, By > B > 0. 

The latter approach allows one to derive uniform stability 
and optimal error bounds for the approximate solution. In 
general, this occurs when By is uniformly bounded from 
below as N increases. In fact, using (26), one can obtain 
that 


VI Vuy lirai + PyllPy lz = Cll (coy 


where C is a constant independent of N. 

As a consequence, under the assumption (26), the error 
estimate on the velocity field is optimal, whereas the error 
on the pressure undergoes a loss of accuracy of order By. 
For instance, in the case where Qy = Py_» N LQ), the 
following error bound can be proven, provided the assumed 
regularity for the exact solution u, p, and the forcing term 
f holds for suitable values of s > 1 and t > 0: 


lu — uylana + NOP Ip - Pyl 


< CNT (luly + Ipla) +N Iro 


Note that the (N + 1)? Gauss—Lobatto nodes are used to 
interpolate the discrete velocity components, while the sub- 
set made of the (N — 1)? interior Gauss—Lobatto nodes can 
be used to interpolate the discrete pressure. Alternatively, 
one could use a staggered grid made of the (N — 1)? Gauss 
nodes for the pressure (and change in (25) the discrete inte- 
grals (py, divvy)y and (gy, divuy)y accordingly). This, 
however, would require interpolation between meshes, as, 
in this case, velocity and pressure feature nodal represen- 
tations with respect to different sets of nodes. 

The algebraic formulation of the discrete Stokes problem 
(25) yields the classical block structure matrix form 


[bo Ibl] 


where we have used test functions vy based on the 
Lagrangian polynomials of degree N of the velocity approx- 
imation, and test functions gy based on the Lagrangian 
polynomials of degree N — 2 (at the interior nodes) of the 
pressure approximation. 

Upon eliminating (although only formally!) the u vector, 
one obtains from (27) the reduced pressure system 


Sp=g, with S=DA“*D™ and g= DA™'f (28) 
The pressure matrix S has (N — 1)? rows and columns. It is 


symmetric; moreover, it is positive definite iff Ker D™ = 0, 
a condition that is equivalent to (26). 


If we consider the generalized eigenvalue problem Sw = 
XMw, where M is the pressure mass matrix (W), Y;)y: 
({,;} being the Lagrangian polynomials (of degree 
<N-—1) associated with the interior Gauss—Lobatto 
nodes), then the maximum generalized eigenvalue i,,,, is 
uniformly bounded (from above) by the coercivity constant 
a of the discrete bilinear form ((Vuy, Vvy))y (we can 
assume & = 1 in the case on hand), whereas the minimum 
one Amin is proportional to B},. As a consequence, the con- 
dition number of the matrix M~'S is cond(M7!S) ~ ar 
thus ~N4—! in the case of the Py — Py, discretization. 

Since S is close to M (the discrete variational equivalent 
of the identity operator), M can serve as preconditioner for 
a conjugate gradient solution of (28). The corresponding 
PCG (Preconditioned Conjugate Gradient) method will con- 
verge in O(N'/”) iterations for 2D problems and in O(N) 
for 3D ones, In practice, however, the convergence is faster, 
as the previous estimate on the asymptotic behavior of By 
is too pessimistic. 

Several kinds of generalizations are in order. 

First of all, we mention that the Stokes system (24) 
could be reduced to a single (vector) equation by L?- 
projection upon the divergence-free subspace Vy, = {V € 
(HE(Q))4| div v = 0}: 


findu € Vy, : [wew ftv, Yv € Va, 
Q g 


Since this is a well-posed elliptic problem, a unique velocity 
field can be obtained and, afterward, a unique pressure p 
can be recovered in L3(Q). 

The simple structure of the reduced problem calls for 
a Galerkin (or G-NI) discretization. However, a computer 
implementation is far from being trivial, as one should con- 
struct a set of polynomial basis functions that are inherently 
divergence-free, This task has been successfully accom- 
plished only for some specific boundary-value problems, for 
instance, when Q is a cylindrical domain and Fourier expan- 
sion in the angular direction is combined with an expansion 
in terms of Chebyshev polynomials in both the longitudi- 
nal and the radial direction. A similar idea is behind the 
approach by Batcho and Karniadakis (1994) to generate 
eigenfunctions of a generalized Stokes operator and use 
them as polynomial divergence-free functions. 

A different kind of generalization consists of using equal- 
order interpolation Py — Py for both discrete velocity and 
pressure fields. However, this choice would give rise to 
a couple of subspaces, Vy and Qy, which violate the 
Brezzi condition (26), yielding spurious pressure modes 
that swamp the physically relevant pressure. In line with 
what is nowadays common practice in the finite element 
community, Canuto and van Kemenade (1996) have pro- 
posed and analyzed a stabilization by bubble functions. The 


148 Spectral Methods 


Cee 


idea consists in adding to (P2,)? a supplementary space 
spanned by local polynomial functions having support in 
one small element called cell. In 2D, a cell is a quadrilateral 
whose four vertices are four neighboring Gauss—Lobatto 
points, whereas in 3D, it is a brick whose eight vertices 
are eight such points. The new velocity space is now 
given by Vy = (P,)? ® BS, where BY denotes the space 
of bubble functions, while the pressure space is simply 
Oy = Py NL3(Q). 

After a careful analysis on the effect of the interaction 
of the local bubble functions with the global polynomials, 
and upon eliminating the bubble functions, contribution by 
static condensation, it is proven that the new stabilized 
discrete problem can be regarded as a Galerkin problem 
like (25); however, the continuity equation is modified by 
the presence of the additional term 


TE toT n Iae 
Cc 


which plays the role of a stabilizing term to damp the oscil- 
latory pressure modes. Here, C is a generic cell and (-, -)o 
is the L?(C) scalar product. Moreover, q, is the (piecewise- 
linear) finite element interpolant of the test function gy at 
the Gauss—Lobatto nodes, ry := —vAuy + Vpy — fis the 
residual, J, is the L?-projection operator into the space of 
piecewise constant functions on the cells. Finally, tẹ is 
the cell-stabilization parameter, which can be expressed in 
terms of the cell size hç, the magnitude of the velocity field 
on C, and the fiuid viscosity. Several expressions for Tç are 
actually available based on alternative approaches that are 
residual-free, 
The Navier-Stokes equations 


3u — vAu + C(u) 


+gadp=f inQ(CR¢,d=2,3) 
divu =0 inQ (29) 
u=0 on dQ 


differ from (24) because of the presence of the acceler- 
ation term ð,u and the convective term C(u). The latter 
can take the standard convective form u- Vu; however, 
other expressions are used as well, such as the conservative 
form div(uu) or the skew-symmetric form (1/2)(u - Vu + 
div(uu)). The three forms are all equivalent for the contin- 
uous equations (with homogeneous Dirichlet boundary con- 
ditions) because of the incompressibility condition. How- 
ever, this is no longer true at the discrete level. Indeed, the 
G-NI spectral discretization of the Navier~Stokes equations 
has different stability properties depending upon which 
form of C(u) is employed. 

For the time discretization of (29), fully implicit 
methods would produce a nonsymmetric, nonlinear system. 


To avoid that, the convective term must be treated 
explicitly. One way is to combine backward-difference 
(BDF) discretization of linear terms with Adams—Bashforth 
(AB) discretization of the convective one. A classical recipe 
is the so-called BDF2/AB3, that is, the combination of the 
second-order BDF discretization with the third-order AB 
discretization: 


3 
—_M+A n+l Tar+l 
(sx + je +D p 


= —M (2 - iu) + Mf") 


23 4 5 
= pÀ iwi OS: ce n-1 B n—2 
(Few zou )+ moe ») 
Du"! = 9 


where M is now the velocity mass matrix, while A, DT, 
and D are the matrices introduced before. To increase time 
accuracy, a BDF3 discretization is coupled with an extrapo- 
lation of the nonlinear term. This gives (Karniadakis, Israeli 
and Orszag, 1991) 


11 
Panini n+l Tyn+l 
(= +a)u +D'p 


= su (w na su ae ut) of Mert! 


— (3C(u") — 3C") + Cu”) 


Du”! =0 


This scheme is third-order accurate with respect to At. 

An extensive coverage of the spectral method for 
Navier-Stokes equations can be found in the books Canuto 
et al.(1988), Deville, Fischer and Mund (2002), and Peyret 
(2002). For their analysis, see also Bernardi and Maday 
(1992) and Bernardi and Maday (1997) (see Chapter 3, 
Volume 3, Chapter 9, this Volume). 


6 ADVECTION EQUATIONS AND 
CONSERVATION LAWS 


In order to illustrate spectral approximations to hyperbolic 
problems, we consider the linear and nonlinear 1D model 
equations u, +au, = O and u, + f(u), = 0, supplemented 
by initial and appropriate boundary conditions. In addition 
to the standard issues related to spectral discretizations (effi- 
cient implementation, imposition of boundary conditions, 
stability, accuracy for smooth solutions), here we face a 
new problem. Indeed, the equation may propagate singular- 
ities alohig characteristics, or even (in the nonlinear case) 
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generate singularities from smooth initial data. So, the ques- 
tion arises: what is the interest of using high-order methods 
in such cases? We will answer this question in the second 
part of the present section. 

For periodic problems, say in (0,22), the Fourier— 
Galerkin method is the conceptually simplest choice: find 
Uy = Uy(t) € Sy such that 


(uy, +4uyy,9)=0 or Uy, + fUn) 9) =, 
Vo E Sy 


Taking ọ = uy, integrating by parts, and using periodic- 
ity, one obtains (d/dt) lluy Ollz20 2m) < Klun Oizo 
(with K = maxi >,;/a,|) for the linear advection equa- 
tion, and (d/dt) lju y (H) l?20,27) = [F U w(t))R™ = 0 (where 
F denotes any primitive of f) for the conservation law. 
This proves the L?-stability of the approximation. 

In terms of Fourier coefficients, the Galerkin method for 
the advection equation is equivalent to the set of ordinary 
differential equations 


Uya) + (aly xx = 0, = 


Setting for simplicity b=uy,, we have (ab), = 

lle a,_,,0,- This is a family of convolution sums, 
which can be computed in O(N?) operations. A more 
efficient scheme consists of transforming back a and b in 
physical space, taking the pointwise product at the nodes 
x; =xj/N, j=0,...,N—1, and returning to Fourier 
space. Using the FFT, the full process costs O(N log N) 
operations. This is the pseudospectral evaluation of 
convolutions sums. There is an error involved, since one 
replaces the exact projection P,(ab) of ab upon Sy by 
its interpolant Jy(ab) at the nodes. Such error, termed 
the aliasing error, is negligible if N is so large that the 
essential features of u are resolved, Otherwise, appropriate 
de-aliasing techniques can be applied, such as increasing 
the number of interpolation|nodes. 

This process applies to the conservation law as well, 
provided the nonlinearity is polynomial (as for Burgers’s 
equation, f(uy) = (1/2)u3 , or for the convective term 
uy Vuy in the Navier—Stokes equations). It can be extended 
to the nonperiodic case by using the Chebyshev nodes 
x; =cosnj/N, j =0,...,N. 

The Fourier—Galerkin method with the pseudospectral 
evaluation of convolutions sean is nothing but the Galerkin 
method with numerical integration described in (6), or 
equivalently, the collocation method at the quadrature 
points 


Uys) +a uy =O, f=0,...,N—-1 


Unless a(x) > a> 0 for all x, this scheme is (weakly) 
unstable due to the aliasing error. Writing the convective 
term in the skew-symmetric form 


au, = 5w, + zau, _ zat (30) 
and applying pseudospectral derivatives, that is, the deriva- 
tive of the interpolant, one recovers the same stability 
estimates as for the pure Galerkin method (in practice, such 
an expensive form is rarely necessary). Again, similar con- 
siderations apply in the nonlinear case as well. 

We now turn to the discretization of nonperiodic prob- 
lems, in the framework of Legendre methods. The advection 
equation is well-posed, provided we prescribe the solu- 
tion, say u(x,) = g,, at the inflow points x, € B_, where 
B, = {x, € {-1, 1}: (EDa@,)n, > 0} with n, = x). The 
most obvious way to account for the boundary conditions 
is to enforce them exactly (or strongly) in the discrete solu- 
tion: uy € Py satisfies uy (x,) = 8p, YXp E B_. The cor- 
responding Galerkin method is L?-stable. Indeed, assuming 
for simplicity g, = 0, we take u y itself as the test function 
and after integration by parts we get 


id 1 
za vleo" 5K lunli 


+ Y, a(x,)n,u>, (x) <0 


xpeBy 


whence stability easily follows. A similar result holds for 
the Galerkin method with numerical integration (G-NI) at 
the Legendre~Gauss—Lobatto points, provided we use the 
skew-symmetric form (30) of the convective term. The 
G-NI scheme is equivalent to enforcing the equation at 
the internal nodes and at the noninflow boundary points 
(x, € B_). a 

A more flexible way to handle the boundary conditions, 
useful, for example, in domain decomposition and for 
systems of equations, is to enforce them in a weak sense. 
The rationale is that if stability holds then accuracy is 
assured, provided the boundary conditions are matched to 
within the same consistency error as for the equation in the 
interior. Thus, we seek uy = Uy (t) € Py satisfying, for all 
vy E€ Py, 


(Uye Uy)y — Un, yxy t D A(X, ply (Xp)Vy (Xp) 
xpEB4 


= >? la(xp)npl8bYy Xp) BD) 


xpEB_ 


This G-NI formulation follows by integrating by parts 
the convective term (for simplicity, we assume a to be 
constant: otherwise, we use the skew-symmetric form (30)). 
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Choosing as vy the polynomial Lagrange characteristic at 
each quadrature node, we see that the advection equation 
is enforced at all internal and noninflow nodes, whereas at 
the inflow nodes we have 


1 
U y1 (Xp) + aly x) + aod E) —g,) =0 


Since 1/w,~cN? as N —> +00, this shows that the 
boundary condition is indeed enforced by a penalty method. 
The stability of the scheme (31) immediately follows by 
taking vy = Uy. Stability is actually guaranteed even if 
we multiply each boundary term in (31) by any con- 
stant t, > 1/2, thus enhancing the flexibility of the penalty 
method. 

Spectral methods for linear advection equations are 
addressed by Gottlieb and Hesthaven (2001), Funaro 
(1997), and Fornberg (1996). 

Let us now consider the nonlinear conservation law 
u, + f(u), =0. The stability (and convergence) of spec- 
tral discretizations is a much more delicate issue than that 
for the linear advection equation. Indeed, the equation may 
develop singular solutions at a finite time, which corre- 
spond to the accumulation of energy in the high-frequency 
modes or, equivalently, to the onset of oscillations around 
discontinuities (Gibbs phenomenon). The nonlinear mecha- 
nism may amplify the high-frequency components, leading 
to destructive instabilities (in stronger norms than L?). On 
the other hand, oscillations should not be brutally sup- 
pressed: they are inherent to the high-order representation 
of discontinuous functions, and they may hide the cor- 
rect information that allows the reconstruction of the exact 
solution. Thus, a good spectral discretization should guar- 
antee enough stability while preserving enough accuracy. 
Furthermore, the discrete solution should converge to the 
physically relevant exact solution by fulfilling an appropri- 
ate entropy condition. 

The mathematically most rigorous discretization that 
matches these requirements is the spectral viscosity method 
(see, e.g. Tadmor, 1998). In the Fourier—Galerkin context, 
it amounts to considering the modified equation 


uya + Py f Un) = EnD Dy Qn Ditty) 


where sy ~cN!-*, m=my ~ N? for some 8 <1— 
1/(2s), and the Fourier coefficients of Q, satisfy 
One = 0 if [ki <m, On, =1—- (m/|k|)-D/* if [ky > 
m. Thus, the sth order artificial viscosity is applied 
only to sufficiently high-frequency modes. For s = 1, 
one can prove that the solution is bounded in 
L®(0,2m), it satisfies the estimate ||¥y(t)llz202n) + 
Jen llty.xOl20,20) < Clty O)Ilz2¢,22) and it converges 
to the correct entropy solution. 


A computationally simpler and widely used road to 
stabilization consists of filtering the spectral solution when 
advancing in time, 


{(N/2)—1t 


Uy(t) > Fyty@) = y o (=) uy ,(t)exp(ikx) 
k=—N/2 


where o =o(n) is a smooth, even function satisfying 
o(0) = 1, o)(0) =0 for all j with 1 <j < some s, 
monotonically decreasing for n > 0 and vanishing (or 
being exponentially small) for n > 1. A popular choice is 
the exponential filter o(n) = exp(—an’*). Interestingly, the 
effect of the spectral viscosity correction described above 
can be closely mimicked by applying the exponential filter 
with o(2k/N) = exp(~Ey Qm nk): 

If the solution of the conservation law is piecewise ana- 
lytic but discontinuous, its truncation Pyu or its interpola- 
tion yu are highly oscillatory around the singularities, and 
converge slowly (O(N -})) to u away from them. However, 
they contain enough information to allow the reconstruction 
of the exact solution with exponential accuracy, away from 
the singularities, by a postprocessing as described below. It 
follows that the crucial feature of the discretization scheme 
is the capability of producing an approximation uy, which 
is spectrally close to Pyu or to Iu. This is precisely what 
is obtained by the spectral viscosity method or by the equiv- 
alent filtering procedure. 

Given Pyu (similar considerations apply to Jyu), the 
postprocessing reconstruction may be local or global. In 
the former case, a spectrally accurate approximation of 
u at a point xọ of analyticity is given by u},(%) = 
Fa Koo Y)Q(%9 — y)Pyu(y) dy, where v=[6N] for 
some ße (0,1), K,(x,y) is, for each x, a v-degree 
polynomial approximation of the delta at x (e.g for 
Fourier, K,(x, y) = 1+ Ly, cos(x — y) is the Dirichlet 
kernel), whereas Q(n) is a C®-localizer around n = 0. In 
the latter case, a spectrally accurate approximation of u on 
an interval [a, b] of analyticity is given (Gottlieb and Shu, 
1997) by the orthogonal projection of Pyu upon P,({a, b]) 
(again v = [BN]) with respect to the weighted inner product 
T u(x)u(x)o, (x) dx, with w,(x) = (œ — a)(b — O E. 
which varies with N. The projection is computed via 
the Gegenbauer polynomials (i.e., the Jacobi polynomials 
{P©-1/2»=1/2)) translated and scaled to [a, b]. 

The reader can refer, for example, to Gottlieb and Shu 
(1997), Gottlieb and Tadmor (1984), and Tadmor (1998). 


7 THE SPECTRAL ELEMENT METHOD 


The spectral element method (SEM) represents another 
example of the Galerkin method. However, the finite 
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dimensional space is now made of piecewise algebraic poly- 
nomials of high degree on each element of a fixed partition 
of the computational domain. For a one-dimensional prob- 
lem, such as, for example (2), we split Q = (a, b) into a 
set of M disjoint intervals Q,, e = 1,..., M, whose end 
points are a =X, <X, < -+ - < Xy = b. Then we set 


Vy = 0 € O@)| vo, E Py, "Ye =1,..., M 
v(a} = v(b) = 0} 


The approximation of (2) by SEM reads 


find Uy m E Vy: ay, m, Y) = (fv) W E Vym 
(32) 
This approach shares the same structure as the p- 
version of the finite element method (FEM). As in the 
latter, the number M of subintervals is frozen, while 
the local polynomial degree (that is indicated by N in 
the SEM context and by p in the FEM context) is 
increased to improve accuracy. More precisely, if h = 
(b — a)/M denotes the constant length of each subinterval, 
one has 


N 
lw — (Ty, mt) Wray + 7 le — Ty w4lle2@,0) 
< CENON u laan 5 20 (33) 


where Ty y is the SEM interpolant. 

If u is arbitrarily smooth (s large), it is advantageous to 
keep h fixed and let N —> oo. 

Should the different degree of smoothness suggest the use 
of a nonuniform polynomial degree, another upper bound 
for the left-hand side of (33) is 


M 
PO CA N u laeo Se 2l, 
ex] 


We aay a 


where N, is the polynomial degree used in the e-th element 
Q,, and H*+!(Q.) is the local smoothness of u in Q,. 

SEM was first introduced by Patera (1984) for Cheby- 
shev expansions, then generalized to the Legendre case by 
Y. Maday and A. Patera. 

Both approaches (SEM and p-version of FEM) make 
use of a parental element, say Q= (-1, 1), on which 
the basis functions are constructed. However, the main 
difference lies in the way the basis functions are chosen 
(and therefore in the structure of the corresponding stiffness 
matrix). 

FEMs of p-type are defined in terms of the Legen- 
dre polynomials L,(&) of degree k (k =2,..., p) E 


Q. Precisely, the p +1 modal basis functions on Q are 
defined by 


1- lg 
9,() = San PE) = a 


f2k—1 sé 1 
PE) = 3 fi uos = WOTE 
x (L, (8) — L,_,(8)), k=2,...,p 


wae 


The first two terms ensure C° continuity of the trial 
functions. 

For the algebraic realization of SEM, nodal basis func- 
tions are those introduced in (13). Being associated with the 
special set of Legendre—Gauss~Lobatto nodes, once they 
are mapped on the current element {Q,,¢=1,..., M}, 
they can be used to generate shape functions, then allow 
us to use LGL quadrature formulas for the evaluation of 
the entries of the stiffness and other matrices and the right- 
hand side. This is reflected by replacing (32) with the more 
interesting SEM-NI version: 


M M 
find uym € Vym: Xay o, Uy m. v= DO V)N o 


e=1 e=1 


Wwe Vym (34) 


where (u, v)y m is the Legendre—Gauss—Lobatto inner 
product (11) in Q,, (u, V) yo, = Ljeo ofulxf)u(xf), with 
af = a[b — a)/2], x; is the correspondent of x; in Q,. 
Moreover, ay 9 (4, v) is the elemental bilinear form. 

Still considering the case of the differential operator (5) 
as an instance, we end up with the following form: 


an 2, (u, v) = (ww, vya + Bu’, Vy o + (yu, UN Q 


in analogy with the left-hand side of (12). 

The multidimensional case can be addressed by first 
introducing the tensorized basis functions on the parental 
element Q = (—1, 1)? (d = 2, 3), then mapping basis func- 
tions and nodal points on every element Q, (now a quadri- 
lateral or parallelepipedal structure, possibly with curved 
edges or surfaces). The functional structure of our prob- 
lem remains formally the same as in (34), and the kind of 
error estimate that can be achieved is similar. Obviously, 
this time Vy y is made of globally continuous functions 
that satisfy homogeneous Dirichlet boundary data (if any). 
They are obtained by joining the elemental functions that 
are the mapping of the nodal basis functions according to 
the transformation T,: 2 — Q, that maps the parental ele- 
ment Q into the current element &,. 

We refer to the seminal paper by Patera (1984) and 
to the books by Bernardi and Maday (1997), Deville, 
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Fisher and Mund (2002), Karniadakis and Sherwin (1999), 
and Schwab (1998). 


8 THE MORTAR METHOD 


This method has been introduced by Bernardi, Maday, and 
Patera (1994) with the aim of allowing spectral elements 
having different polynomial degrees or being geometrically 
nonconforming, and also to allow the coupling of the 
spectral (element) method with the finite element method. 
Its generality, however, goes beyond these two specific 
examples. Consider, for the sake of illustration, the Poisson 
problem with homogeneous Dirichlet conditions. The idea 
is to approximate its weak form (13) by the following 
discrete problem: 


M M 
find u, € V;: Vu, - Vos = vs Wo, E V, 
BoM oe 8 aus a E Ys 
iat 7% i 


a (35) 

Here, 8 > 0 is a parameter describing the quality of the 

discretization, and V, is a finite dimensional space that 

approximates Hj(&) without being contained into C°(2). 
More precisely, V; is a subspace of the following space: 


Y, = {uy E€ LAQ) lva EY i=1,..., M} (6) 


where, for each i = 1,..., M, Yis is a finite dimensional 
subspace of H! (Q;): it can be either a finite element space, 
or a polynomial spectral (elements) space. In any case, no 
requirement of compatibility is made for the restriction of 
the functions of Y, on the element interface T. 

Heuristically, the space V, will be made up of functions 
belonging to Y, that satisfy some kind of matching across 
T. Precisely, assuming for simplicity that there are only 
two elements, if v € V; and vf? € Y, 5, u € Y, , denotes 
its restriction to Q, and Q, respectively for a certain fixed 
index i, the following integral matching conditions should 
be satisfied: 


Í OP -PP =0 PEAP 6D 


where A” denotes the restriction to T of the functions of 
Yia 

If we take i = 2 in (37), this amounts to letting Q, play 
the role of master and Q, that of slave, and (37) has to be 
intended as the way of generating the value of uf once 
Ue is available. The alternative way, that is, taking i = 1 
in (37) is also admissible. Depending upon the choice of 
index i made in (37), the method will produce different 
solutions. 


The mathematical rationale behind the choice of the 
matching condition (37) (rather than a more ‘natural’ con- 
dition of pointwise continuity at one set of grid nodes on T) 
becomes clear from the convergence analysis for problem 
(35). 

With this aim, we introduce 


wll, = (lelie + ll Voro, 15,0, + Vua llia) (38) 
which is a norm (the ‘graph’ norm) for the Hilbert space 
H, := {v € LQ) (vo, € HQ), vp, € H(2)} 69) 
Owing to the Poincaré inequality, we have that 
2 
H) [Vul > adlu? Yos € Vs (40) 
jay Y% 
whence the discrete problem (35) admits a unique solu- 


tion by a straightforward application of the Lax—Milgram 
lemma. 


For any v, € V;, we now have 


2 
2 7 
allus = 5112 < Aa [Vn — »4)I 
J 


i=l 


2 2 
=D fr Y=- |, Va Y= 


i=1 
2 2 
= H fu- v5) — >| Vus: V(uy— v3) (41) 
imi 79 far 1 9% 


Replacing f by —Au and integrating by parts on each 
Q;, we obtain 


2 2 
Xf FMWD f Yy- 


ð 
- [ Flm-w-@-w] aD 


(here, (3/ən) is the normal derivative on T pointing into 
2). 
Denoting by 


i GO EESTE 
[vs]r = Usp — Yar 


the jump across T of a function v; € Vj, from (41) and 


(42), we have that 

2 ðu 

a, [lus — vli < [lu — giles ~ vgil, + ziu; — vlr 
rôn 
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and also 


ðu 
f Elar 
llus — ell < — | lla — valle + sup 2222 I 
: a a, fe weVs (wll. 
By the triangle inequality 
[ju — uzila < lju — vll + lliz — Uslly 


we then obtain the following inequality for the error u — uz: 
u — izlli S Ea inf jju — ull 
il alls S a, J EVs G 


ðu 
1 | aer 
+— sop (43) 


a, weve [lll 


The approximation error of (35) is therefore bounded 
(up to a multiplicative constant) by the best approximation 
error (i.e., the distance between the exact solution u and the 
finite dimensional space V,) plus an extra error involving 
interface jumps. The latter would not appear in the frame- 
work of classical Galerkin approximation (like the SEM), 
and is the price to pay for the violation of the conforming 
property; that is, for the fact that V; ¢ H} (Q). 

The error estimate (43) is optimal if each one of the 
two terms on the right can be bounded by the norm of 
local errors arising from the approximations in Q) and 
Q,, without the presence of terms that combine them in a 
multiplicative fashion. In this way, we can take advantage 
of the local regularity of the exact solution as well as the 
approximation properties enjoyed by the local subspaces 
Y,, of H1 (Q). 

To generate a nodal basis for the finite dimensional space 
V;, we can proceed as follows. For i = 1, 2, let us denote 
by N; the set of nodes in the interior of Q, and by M © the 
set of nodes on I’, whose cardinality will be indicated by 
N, and Nx”, respectively, Note that, in general, M and 
N® can be totally unrelated. 

Now, denote by {oP}, k’=1,...,.N,, the Lagrange 
functions associated with the nodes of NM ; since they vanish 
on T, they can be extended by 0 in &,. These extended 
functions are denoted by (GO), and can be taken as a first 
set of basis functions for V}. 

Symmetrically, we can generate as many basis functions 
for V; as the number of nodes of M, by extending by 0 
in Q, the Lagrange functions associated with these nodes. 
These new functions are denoted by {$9}, k" = 1,.... Np. 

Finally, always supposing that Q, is the master domain 


and Q, its slave, for every Lagrange function {ons} in 


Qy me Less, NĒ, we obtain a basis function {G,, r} as 
follows: 


where 


NO 


r 
aD i (2) 
Par = D EOT 
jal 


of} are the Lagrange functions in 92, associated with the 
nodes of N, and Ẹ j are unknown coefficients that should 
be determined through the fulfillment of the matching 
equations (37). Precisely, they must satisfy 


ne 
2} (2} 
f(E- Wat 
Jal 


(44) 

A basis for V; is therefore provided by the set of all 

functions {$P}, K = 1,...,. No Oh K = 1. Np 
and {$n rh m = 1,...,NQ?. 


Remark. In the mortar method, the interface matching is 
achieved through a L?-interface projection, or, equivalently, 
by equating first-order moments, thus involving computa- 
tion of interface integrals. In particular, from equations (37), 
we have to evaluate two different kinds of integrals (take, 
for instance, i = 2): 


a), .@ — | yy 
In = fo bys Ly = fof Hs 


The computation of J, raises no special difficulties, because 


both functions T and p live on the same mesh, the 


one inherited from Q,. On the contrary, v and p? are 
functions defined on different domains, and the computa- 
tion of integrals like 7}, requires proper quadrature rules. 
This process needs to be done with special care, especially 
for three-dimensional problems, for which subdomain inter- 
faces are made up of faces, edges, and vertices; otherwise 
the overall accuracy of the mortar approximation could be 
compromised. 


The discrete problem (35) can also be reformulated as a 
saddle point problem of the following form: 
find that u, € Yy, 4, E€ AP such that 


2 
a(uz, Us) + D(Us, dg) = YG uo, Vu, € Ys 
i=l 
(2) 


bluz, Ws) = 0 Vus E Ag 
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where 


2 

alw, vs) = Ti vue. vo? 
inl YÈ 

b¢vg, wa) = f OP — oP ng 


In this system, 2, plays the role of the Lagrange multi- 
plier associated with the ‘constraint’ (37). 

Denoting by 9, j = hess Mi ENS +N +N, a 
basis of Y, and by ,, l= 12 NO; a basis of KO; 
we introduce the matrices 


Ay = 49) P) Bis = bls Y) 


Defining by u and 2 the vectors of the nodal values 
of u, and 2, respectively, and by f the vector whose 
components are given by J (f, @)o,,8 = 1L, ..., Ni + 
N, + nO + Ne, we have the linear system 


A B']fu] ff 
[s o Ii] 

The matrix A is block-diagonal (with one block per 
subdomain $&;), each block corresponding to a problem 
for the Laplace operator with a Dirichlet boundary con- 
dition on 8Q; N IN and a Neumann boundary condition on 
IR \ AQ. 

After elimination of the degrees of freedom internal to the 
subdomains, the method leads to the reduced linear system 
(still of a saddle point type) 


Ss c™)]fup]_ Ter 
leo JL ]-[8] 
where the matrix S is block-diagonal, C is a jump operator, 
up is the set of all nodal values at subdomain interfaces, 
and gp is a suitable right-hand side. 

This system can be regarded as an extension of the Schur 
complement system to nonconforming approximation (the 
Lagrange multiplier à indeed accounts for nonmatching 
discretization at subdomain interfaces). In fact, the ith 
block of S is the analogue of X; „, and corresponds to a 
discretized Steklov—Poincaré operator on the subdomain 
Q. 


Remark. All results cited in this note can be recovered 
from the books and general articles that are quoted in the 
References. 
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1 INTRODUCTION 


Increasingly realistic models in computational mechanics 
and the search for more and more accurate simulations place 
continuously growing demands on computation that surpass 
the ongoing increase of computing power. Thus, paradox- 
ically, these finer models might be of limited use in the 
absence of new computational strategies. One promising, 
emerging strategy is to dynamically adapt discretizations in 
the course of the computational solution process. Adaptive 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


strategies of this type have been observed to reduce the 
complexity of computational problems arising in large scale 
numerical simulation. Therefore, adaptivity provides an 
enormous potential for advancing the frontiers of com- 
putability. By bringing more and more complex tasks into 
reach, it offers, in the long run, better and better access to 
physical phenomena through a powerful numerical micro- 
scope. On the other hand, to advance these techniques 
to their natural fruition requires an understanding of the 
power of adaptivity vis-a-vis traditional methods of com- 
putation. This includes clarifying the optimal performance 
that can be expected from adaptive methods and how this 
compares with the performance using nonadaptive tech- 
niques. 

This chapter describes adaptive numerical strategies in 
the context of multiscale decompositions using wavelet 
bases. In addition to formulating adaptive strategies to 
be used in a variety of settings, the chapter will pro- 
vide an a priori analysis of the computational efficiency 
of adaptive methods. This will delineate the advantages 
of adaptive strategies versus standard computational meth- 
ods. 

Adaptivity takes a variety of forms distinguished by their 
principal goals. In many applications, one is not interested 
in the complete solution of a given problem but only in 
certain local functionals of an object that may be globally 
defined like the solution of a boundary value problem. In 
this case, an adaptive discretization reflects how much has 
to be paid to the global character of the object when try- 
ing to recover local information about it. However, this is 
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not the direction of the present chapter. Instead, this chap- 
ter focuses on recovering the whole object in question. In 
the context of fluid mechanics, this may mean recovering 
the vortices in the wake of an airfoil, or the interaction of 
shocks even at some distance of the airfoil, or the recov- 
ery of a full stress field, or eventually, understanding more 
about developing turbulence. The objective is to develop 
numerical techniques that are able to extract information 
within desired error tolerances at minimal cost. This means 
that searched-for-quantities like pressure or velocity are to 
be recovered within some accuracy tolerance, for exam- 
ple, with respect to some norm. This should be done at the 
expense of a number of degrees of freedom that remains 
proportional to the minimal number of degrees of freedom 
(in a certain discretization framework) needed to approxi- 
mate the object based on full information within the desired 
target accuracy. From the mathematical point of view, it is 
not clear beforehand at all whether this is possible solely 
based on a posteriori information acquired during a solu- 
tion process. We shall indicate in this chapter an affirmative 
answer for a wide range of problems arising in engineer- 
ing applications (see Cohen, Dahmen and DeVore, 2001, 
2002a,b,c). 

Our approach involves expansions of functions into 
wavelet bases. In such expansions, the wavelet coefficients 
encode detail information that has to be added when pro- 
gressing to higher levels of resolution of the underlying 
function. These coefficients convey local structural infor- 
mation such as the regularity of the expanded function. 
The decomposition naturally breaks the function into differ- 
ent characteristic length scales. A central question in many 
dynamical simulation tasks concerns the interaction of these 
different scales. As we shall show, wavelet analysis offers 
a promising way to describe the behavior of contributions 
from different length scales under nonlinear mappings. We 
shall see that wavelet expansions offer quantitative ways of 
estimating nonlinear effects that appear, for example, in the 
Navier Stokes equations. Moreover, we shall point out how 
such an analysis aids the adaptive solution process. This 
already indicates the marriage between the analysis and 
numerical resolution of a problem facilitated by wavelet 
concepts. Therefore, the understanding of these concepts 
and their potential requires a certain amount of functional 
analysis as will be described in this chapter. 

We do not attempt to give an exhaustive overview 
of wavelet analysis pertinent to computational mechanics 
issues. Nor will the topics presented here be treated in a 
self-contained way. Both would be far beyond the scope 
of this chapter. Rather, we shall focus on presenting some 
concepts and ideas, which in our opinion best reflect the 
potential of wavelets, thereby offering some orientation that 
could be complemented by the extensive list of references. 


The following surveys and text books are recommended as 
sources of more detailed expositions: Cohen, 2000, 2003; 
Dahmen, 1997, 2001; DeVore, 1998. 

The organization of the material is in some sense ‘two- 
dimensional’. Most simulation tasks are based on continu- 
ous mathematical models formulated in terms of integral or 
(partial) differential equations. The ‘first dimension’ is to 
group the different concepts with respect to the following 
two major problem classes. The first one concerns evolution 
problems 


au = E(u) (1) 


together with initial and boundary conditions. The second 
class concerns stationary problems 


Ru) =0 (2) 


which are usually given in variational form. The scope 
of problems covered by (2) will be illustrated by a list 
of examples including mixed formulations and nonlinear 
problems. Of course, there is no clear dividing line. For 
instance, an implicit time discretization of a parabolic 
evolution problem leads to a family of problems of the 
type (2). The problems grouped under (2) are typically 
elliptic (in the sense of Agmon—Douglis—Nirenberg) for 
which Hilbert space methods are appropriate. In contrast, 
we focus under (1) on nonlinear hyperbolic problems. It 
will be seen that the respective concepts are quite different 
in nature. The nature of the relevant function spaces, 
for example, L, which admits no unconditional basis, 
causes an impediment to exploiting the full potential of 
wavelets. 

The ‘second dimension’ of organization concerns the way 
wavelet features are exploited. In Section 2, we review 
briefly the main features that drive wavelets as analysis 
and discretization tools. Aside from transform mechanisms, 
these are the locality (in physical and frequency domain), 
the cancellation properties, and the norm equivalences 
between function and sequence spaces. The latter facilitates 
a stable coupling of the continuous and the discrete world. 
Together with the first two features, this is also fundamental 
for fast numerical processing. 

In Section 3, these features are applied to (1). The pri- 
mary focus of adaptivity here is the sparse approximation 
of the unknown solution, mainly owing to the cancella- 
tion properties, In this context, wavelets are not used as 
stand-alone tools but are rather combined with conventional 
finite volume discretizations. The numerical approximation 
represented by arrays of cell averages is compressed in 
a manner similar to image compression. This amounts to 
a perturbation analysis in which one seeks a significant 
data compression while preserving essentially the accuracy 


2 


of the underlying reference discretization for a fixed level 
of resolution. The approach and the performance of such 
schemes are illustrated by some numerical examples con- 
ceming aerodynamical applications. 

The remainder of the chapter is concerned with the 
problem class (2). Section 4 deals with global operators 
represented here by the classical boundary integral equa- 
tions. Now the above-mentioned main features of wavelets 
are used mainly to obtain sparse approximations of opera- 
tors. This time the elliptic nature of the problem allows one 
to formulate stable Galerkin discretizations. When using 
finite elements or boundary elements, the resulting stiffness 
matrices are densely populated and, depending on the oper- 
ator, are increasingly ill-conditioned when the mesh size 
decreases. In wavelet formulations, the norm equivalences 
and cancellation properties are used to show that the stiff- 
ness matrices can be replaced by sparse well-conditioned 
ones without sacrificing discretization error accuracy. This 
allows one to solve such problems in linear time. Again, 
this is essentially a perturbation approach in which this time 
sparse approximations apply to the operator and not to the 
function. Adaptivity refers here primarily to the quadrature 
used to compute the compressed stiffness matrices with a 
computational effort that stays proportional to the problem 
size. As for the current state of the art, we refer to Dahmen, 
Harbrecht and Schneider (2002), Harbrecht (2001), and the 
references cited there. 

In Section 5, we introduce a new algorithmic paradigm 
that emerges from exploiting both, the sparse approxima- 
tion of functions and the sparse representation of operators 
together. It aims at intertwining in some sense the anal- 
ysis and resolution aspects of wavelet concepts as much 
as possible. Here are the main conceptual pillars of this 
approach: 


A transform point of view: Many studies of wavelet 
methods for the numerical solution of PDEs are very similar 
in spirit to classical finite element discretizations where the 
trial spaces are spanned by finite collections of wavelets. 
This has so far dominated the use of wavelets in the context 
of boundary integral equations and is the point of view 
taken in Section 4. However, this does not yet fully exploit 
the potential of wavelets. In fact, similar to classical Fourier 
methods, wavelets can be used to formulate transform 
methods that are best explained in the context of variational 
formulations of (linear or nonlinear) operator equations like 
boundary value problems or boundary integral equations. 
Unlike finite volume or finite element schemes, wavelet 
bases can be used to transform the original variational 
problem into an equivalent problem over £,, the space 
of square summable sequences indexed by the wavelet 
basis. Moreover, when the wavelets are correctly chosen in 
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accordance with the underlying problem, the transformed 
(still infinite dimensional) problem is now well-posed in a 
sense to be made precise later. We shall point out now the 
main principles along these lines. 


Staying with the infinite dimensional problem: In many 
cases, the underlying infinite dimensional problem, for 
example, a PDE, is fairly well understood. In mathematical 
terms, this means that when formulated as an operator 
equation, the operator is known to be boundedly invertible 
as a mapping from a certain function space into its dual, 
which is another way of saying that the problem is well 
posed in a certain topology — there exists a unique solution, 
which depends continuously on the data in the topology 
given by that function space. 

When transforming to the wavelet domain, the properties 
of the operator are inherited by the transformed opera- 
tor which now acts on sequence spaces. The main point 
we wish to stress is that the original infinite dimensional 
problem is often better understood than specific discretized 
finite dimensional versions and therefore there is an advan- 
tage in delaying the movement to finite discretizations as 
long as possible. A classical example of this is the Stokes 
problem in which a positive definite quadratic functional 
is minimized under the divergence constraint and thus has 
saddle point character. The Stokes problem is well posed 
in the above sense for the right pairs of functions spaces 
for the velocity and pressure component (see e.g. Brezzi 
and Fortin, 1991; Girault and Raviart, 1986). It is well- 
known that Galerkin discretizations, however, may very 
well become unstable unless the trial spaces for veloc- 
ity and pressure satisfy a compatibility condition called 
the LadyShenskaya-BabuSka-Brezzi (LBB) condition. For 
the Stokes problem, this is well understood but in other 
situations, as in many physically very appropriate mixed 
formulations, finding stable pairs of trial spaces is a more 
delicate task, So, in some sense one may run into self- 
inflicted difficulties when turning to finite discretizations 
even though the original problem is well behaved. Is there 
an alternative? 


Stabilizing effects of adaptivity: The very fact that, 
unlike conventional schemes, a suitable wavelet basis cap- 
tures the complete infinite dimensional problem and puts it 
into a well-conditioned format over £, can be used to avoid 
fixing any finite dimensional dicretization. Instead the well- 
posedness offers ways of formulating an iterative scheme 
for the full infinite dimensional problem that converges 
(conceptually) with a fixed error reduction per step. Only 
after this infinite dimensional analysis is complete do we 
enter the numerical stage by applying the involved infimte 
dimensional (linear and also nonlinear) operators adap- 
tively within suitable stage-dependent dynamically updated 
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error tolerances. Roughly speaking, this numerical approach 
inherits the well-posedness of the original problem and 
allows us to avoid imposing compatibility conditions such 


as LBB. 


Adaptive evaluation of operators: A central issue is 
then to actually realize concrete adaptive evaluation 
schemes for relevant operators and to analyze their 
computational complexity. We shall engage this issue for 
both linear and nonlinear examples. While at the first glance 
nonlinear operators seem to interfere in an essential way 
with wavelet concepts (as they do with regard to the Fourier 
techniques), we claim that they offer particularly promising 
perspectives in this regard. In conventional discretizations, 
the image of a current adaptive approximation under a 
nonlinear mapping is usually discretized on the same mesh. 
However, a singularity, which may cause an adaptive local 
refinement, is often severely affected by a nonlinearity so 
that this mesh might no longer be optimal. In the present 
framework, the adaptive evaluation will be seen to generate 
at each stage the right choice of degrees of freedom for the 
image of the nonlinearity; see Section 6.2. This is based on 
quantitative estimates on the interaction of different length 
scales under nonlinear mappings. 

It would be far beyond the scope of this chapter to 
address any of the above issues in complete detail. Instead, 
our presentation will be more of an overview of this subject 
which should serve to orient the reader to the essential 
concepts and point of views. The interested reader will then 
find extensive references for further reading. 


2 WAVELETS 


In this section, we give a brief overview of those features 
of wavelets and multiresolution that are important for our 
presentation. There are many different ways of viewing and 
motivating wavelet expansions (see e.g, Daubechies, 1992). 
Our point of view in the present context is conveniently 
conveyed by the following example. 


2.1 The Haar basis 


The starting point is the box function o(x) = Xo, n) 
which takes the value one on [0, 1) and zero outside. The 
normalized dilates and translates $, , = 2//2o (2/ . —k), 
k=0,...,2/ — 1, of ọ are readily seen to be orthonormal, 
that is, (j O;2)10,11 = Jo Oj4(#)0)1(2) dx = êp. Hence 


21 


PA) = Yh Ooi 


k=O 


= 


is for each j € Ny a simple orthogonal projector from 
L,([0, 1]) onto the space 5; of piecewise constant functions 
subordinate to the dyadic mesh of size 2—7. This projection 
resolves the function f up to scale j while finer details are 
averaged out; see Figure 1. 

If the resolution is found to be insufficient, one has to 
discard previous efforts and recompute with a larger j. 

In contrast, 


f=} (P -PDf (P=) 6) 
j=0 


is a multiscale representation of the function f. Each term 
(P, — P;_;)f represents the detail in f at the scale 2-/. 
Moreover, the layers of detail at each scale are mutually 
orthogonal. 

As can be seen from Figure 2, to encode the difference 
information, one can use in place of the averaging profile 
(x) an oscillatory profile (x) - the Haar wavelet — 
given by 


a(x) = (2x) — o(2x — 1) which implies 


o(2x) = eee ve) and therefore 
_ $o) — pa) 
px- l= a 
1 1 T| 


14 -1 


Hence, the fine scale averaging profiles can be recov- 
ered from a coarse scale average and an oscillatory pro- 
file. Thus, defining again 6, , := 2//(2/ -—k), Yje = 


| 
t 
f 
i 
E 
i 
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x 
Figure 1. Piecewise constant approximation. A color version 


of this image is available at http://www.mrw.interscience,wiley. 
com/ecm 


24/245(2/ . —k), one easily verifies the two-scale relations 


yA 


bjr a Pitt + Ojiz) 


1 
Wj = pema = Oj41,2%+1) 
i 


Oj41,2" = abi Tija 
1 
Pjr = pen A A2 W 


which give rise to a change of basis 


Qitt_y 2-1 m1 


5 Ciiik = be Cp Opt L dbj 6) 
k=0 k=0 k=0 


where 
1 

Chk = WAAG + Cp41,2b41)s 

d 


1 
Lk = pema = Cj 41,2641) 


1 
Chek = a Cie + djk), 
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1 
Cj41,2k+1 = Fain ~ 454) (6) 


Thus the representation in terms of fine scale averages 
can be obtained from the coarse scale averages already in 
hand by simply adding the detail (lost through the coarse 
projection) encoded in terms of the oscillatory profiles. 
The translated dilations Y; (x) := 27/24 (27 - —k) are eas- 
ily seen to be pairwise orthonormal 
ie Prel = ypg AIK EZ O 
Here and in the following, we use the notation (f, 8)g = 
Jo fgdx but suppress at times the subscript 2 when the 
reference to the domain is clear from the context. 
Obviously, the above change of basis (5) can be repeated 
which gives rise to a cascadic transform — the fast wavelet 
transform. It transforms a linear combination of fine scale 
box functions with an array of averages c; into a linear 
combination of coarse scale box functions with coefficient 
array Cy and Haar wavelets with arrays of detail coeffi- 
cients d, for each dyadic level j < J. This decomposi- 
tion transform T: c} —> d” := (€o, do, d}, .-., d z_1) looks 
schematically as follows: 


Cr > irg > So => > Qà > Q 
Ù N N b 

dji di PPE da, dy 

(8) 


In other words, from cz, we determine c,_,, and dz; by 
using (6), and so on. By (7) T is represented by a unitary 
matrix whose inverse is given by its transpose. Therefore 
the transform T7!:d7 := (Co, do, ---, d'y_1) > €z, which 
takes the detail coefficients into the single-scale average 
coefficients, has a similar structure that can also be read off 
from the relations (6): 


O > & > & > © > Ga > G 
A 7 Z 7 
d d d dj 
0 1 2 O) 
Thus, starting with cy and the wavelet coefficients 
do, ---, d z_;, we can use (9) to find c}. Due to the cascadic 


0 10 


1 0 1 


Figure 2. Different levels of resolution of f. A color version of this image is available at http://www.mrw.interscience.wiley.com/ecm 
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structure and the fact that the relations in (6) involve only 
finite filters or masks, the number of operations required by 
both transforms is O(2/), i.e. stays proportional to the size 
of the arrays cy. 

In summary, there is a convenient and fast way of switch- 
ing between different representations of a given projection 
P,(f) = Lines xj each having its advantages, as we 
shall see in subsequent discussions. From the theoretical 
point of view, since the wavelets allow us to encode the 
jth dyadic level of detail of a function f as 


Um] 
(Py41 ad Pf = X Fj AAV) dif), := (f Pig 


k=0 


the telescoping expansion (3) yields 


oo 
f = Pof +0; = Pf 
j=l 
co 2-1 
= al, HIN (10) 
jan k=0 
The convergence of this expansion in L, follows from the 
convergence of the orthogonal projections in L3. 
A wavelet decomposition of a function f € L, is anal- 


ogous in spirit to the decimal representations of a real - 


number. The wavelet coefficients play the role of digits; 
receiving more wavelet coefficients gives us progressively 
better accuracy in representing f. Of course, the classi- 
cal Fourier transform of periodic functions and also Taylor 
expansions do, in principle, the same. The particular advan- 
tages of wavelet representations rely to a large extent on 
the following facts. First of all, the orthonormality of the 


Yje gives 
1/2 


fl, = | DMB, - Ba flz, 


j=0 


=ldAll GD 


That is, there is a tight relation between the function and 
coefficient norm. Thus perturbing the digits, which will 
happen in every computation, in particular, discarding small 
digits, will change the function norm only by the same small 
amount. Clearly, the convergence of the series implies that 
the digits will eventually have to become arbitrarily small. 
However, which digits become how small can easily be 
inferred from local properties of f. In fact, since the yp, ; 
are orthogonal to constants — they have first order vanishing 
moments — one has for S; , := supp Yj = 27 Ik, k + 1] 


Id (PV ink UF = 0, Wal < ink LF = elias 


S27 NF leas (12) 


where the last estimate follows, for example, from Taylor’s 
expansion, Thus, dirl f) is small when f| Sa is smooth. 


2.2 Biorthogonal wavelets on R 


The Haar basis is, of course, not suitable when, for instance, 
higher regularity of the approximation system is required. 
The discovery by I. Daubechies (Daubechies, 1992), of 
a family of compactly supported orthonormal wavelets in 
L,(R) of arbitrary high regularity opened the door to a 
wide range of applications. Of perhaps even more practical 
relevance was the subsequent construction of biorthog- 
onal wavelets put forward by Cohen, Daubechies and 
Feauveau (1992). The biorthogonal approach sacrifices L3- 
orthogonality in favor of other properties such as symmetry 
of the basis functions and better localization of their sup- 
ports, 

The construction of biorthogonal wavelets starts with a 
dual pair of compactly supported scaling functions 9, 6, 
that is, 


(p PC-D) = doe kEZ | (13) 


that satisfy the two scale relations 


(x) = apr -k, D= J abk) (14) 
keZ keZ 


with finitely supported masks (a;)rez» (Gy) pez See (4). Each 
of the functions 


$e) = Aa, ox =k) 


kez 


$e) =J Daab = k) (15) 


kez 


generates by means of shifts and dilates a biorthogonal 
basis for L, (R). Each f € L,(R) has the following unique 
expansions: 


f= y Ys Vide om by Sh Uj deV ix (16) 


j=~1 keZ j=~l keZ 


where we have used the notation, W1 i = Qo, Vig= 


o,e: One thus has (Wt Vim)e = By, km 
Each of these systems is a Riesz basis, which means [1] 


ee] oo 
Ife ~ CIA do~ DO DOE eal? 
j=~1 keZ j=—1 keZ an 


The inequalities (17) ensure a tight relation between the 
function norm and the coefficient norm. 


Cohen, Daubechies and Feauveau (1992) construct a fam- 
ily of biorthogonal pairs with each of w, of compact 
support. Given any desired order r of differentiability, one 
can find a biorthogonal pair in this family with y having 
r continuous derivatives. Moreover, one can also require 
that a suitable linear combination of ($(- — k)),<z (respec- 
tively (6(- — &)),¢z) will represent any given polynomial 
of order < m, (respectively m). The biorthogonality rela- 
tions then imply that the wavelets yp, p, Ve (for j > 0) 
are orthogonal to all polynomials of order m, m respec- 
tively. An analogous argument to (12) then shows that the 
coefficients (f, $x), (f, Wy.) decay like 2-"/, 277} when 
f has bounded derivatives on the supports of Vie Pjg OF 
order m, m, respectively, in L,. Thus higher local smooth- 
ness results in a stronger size reduction of corresponding 
wavelet coefficients. 

The setting of biorthogonal wavelets is particularly 
appealing from a practical point of view since the primal 
generator > can be chosen as any B-spline, and in turn the 
primal wavelet generator i) is also a spline function with 
an explicit — piecewise polynomial — analytical expression. 


2.3 Wavelets on domains 


Biorthogonal wavelets provide a beautiful and conceptually 
simple multiscale decomposition of functions. They offer 
great versatility in the choice of the basis and dual elements 
including compact support, smoothness, and even piecewise 
polynomial structure. Meanwhile they maintain the essen- 
tial features of orthogonal wavelet decompositions such as 
norm equivalences and cancellation properties. Moreover, 
in connection with differential equations, the space L, often 
plays only a secondary or artificial role. Therefore, dis- 
pensing with L,-orthogonality is, in general, not even a 
quantitative drawback. 

That is the good news. The bad news is that the above 
wavelet constructions are inherently made on R, RÍ, or a 
torus. In numerical applications, the setting is typically on a 
finite domain or manifold Q. Such domains and manifolds 
do not maintain the dilation and translation structure of 
the full Euclidean space or the torus. [2] Fortunately, there 
are constructions of multiscale bases tailored to general 
domains and manifolds. Albeit, these come at some expense 
of a certain level of technicality. In order not to destroy the 
main flow of this chapter, we shall only give an overview of 
some of the ideas used for these constructions. The reader 
can consult Dahmen (1997) and the references quoted 
there for a more detailed description of the construction 
of multiscale bases on (bounded) domains and manifolds. 
The starting point of these constructions is again multire- 
solution. By this, we mean a hierarchy of (now finite 
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dimensional) subspaces S; of some function space 7 


BENCE, Ls =x 
j 


that are spanned by single-scale bases S, = span ®; =: 
S(®,), ©, = {,:  € Z}. The space X is typically an L, 
or Sobolev space. It is important that the bases ®, are 


scalewise stable with respect to some discrete norm || - || in 
the sense that 
leybyllayexl ~ IDS eyoyll (18) 
yt, 


with constants that do not depend on the level j. In the case 
where Y= L, or Wp, the discrete norm || - || is typically 
the £, norm. For instance, ®, could be a finite element 
nodal basis on a j-fold refinement of some initial mesh for 
Q. In this example, the indices y may represent the vertices 
in the mesh. One then looks for decompositions 


Si =5, PW, W =s Y, Y= lhe F} 


The multiscale basis 


w= [J] y= Wa = D) 


j=- 


is then a candidate for a wavelet basis. At this point a 
word on notation is in order. The index set has two com- 
ponent subsets: J = Jy u Jy- The index set Jy has a finite 
cardinality and labels the basis functions in Sọ of ‘scaling 
function’ type. The true wavelets correspond to the indices 
in Jy. These indices absorb different types of information 
such as the scale j = |X|, the spatial location k() or, when 
dealing with a spatial dimension d > 1, the type e(d) of 
W.: An example is , (x, y) = Y Y2 (x, y) — (k, D) = 
ix = kely- where Ao (j, k, D, 
(1, 0)). 

Of course, there is a continuum of possible complements 
W; and the question arises as to what are ‘good comple- 
ments’. The previous section already indicates the role of 
biorthogonality in this context. So a typical strategy is to 
split the multiresolution spaces S; in such a way that there 
exists a biorthogonal or dual collection ý, corresponding 
to a dual multiresolution sequence (S). that belongs to the 
dual 4’ in such a way that 


Wh by) =) WvEeT 
and hence 


=L ih a9) 


REJ 
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(a) 


(b) 


Figure 3. (a) Primal wavelet (b) Dual wavelet. A color version of this image is available at http://www.mrw.interscience.wiley.com/ecm 


The classical situation is Y= Æ’ =L,(Q) so that 
one has in this case the alternate representation f = 
regl f W) (see Carnicer, Dahmen and Peña, 1996; 
Cohen, Daubechies and Feauveau, 1992, Dahmen, 1994, 
1996). 

When no global smoothness is required, the concept of 
multiwavelets (Alpert, 1993) offers a convenient way of 
generalizing the Haar basis to higher order accuracy and 
cancellation properties; see, for example, von Petersdorff, 
Schneider and Schwab (1997) for an application to second 
kind integral equations. 

We describe now one concrete approach (see Canuto, 
Tabacco and Urban, 1999, 2000, Cohen and Masson, 1997, 
Dahmen and Schneider, 1999a,b) based on domain decom- 
position that works for ¥ = L, (Q) and realizes at least 
global continuity. Suppose that 


2 = |), nena D Kyi > Ry 


where each K; is a regular mapping from a parameter 
domain such as the unit d-cube, into a subdomain Q; of Q. 
A wavelet basis Y? is then constructed along the following 
lines: 


e Start with biorthogonal wavelet bases YE, ® on R 
and adapt them to biorthogonal wavelet bases W7, Ñ 
on J = [0,1]. 

© Use tensor products to obtain bases Y? on the unit 
cube O = (0, 1]. 

e Use parametric liftings to derive bases Y® = y2 o 
ky! on Q =k, (Cl), which then have to be glued 
together to produce, for example, globally continuous 
bases W = WÊ on Q (see e.g. Canuto, Tabacco and 
Urban, 1999, 2000; Cohen and Masson, 1997; Dahmen 
and Schneider, 1999a). An alternative approach leads 
to wavelets of arbitrary regularity permitted by the 
regularity of the domain (Dahmen and Schneider, 
1999b). 


The following Figure 3 has been provided as a courtesy 
by H. Harbrecht. It displays an example of a globally 
continuous primal and dual wavelet on a two dimensional 
patchwise defined manifold in which the supports cross the 
patch boundaries. 

Alternatively, hierarchies of uniform refinements of an 
arbitrary initial triangulation can be used to construct finite 
element based wavelets (see e.g. Dahmen and Stevenson, 
1999; Stevenson, 2000). 

All these constructions aim to realize (19). However, 
biorthogonality by itself is not quite sufficient in general 
to guarantee relations like (17). It is shown in Dahmen 
(1996) that if in addition the multiresolution spaces S; and 
Š, each satisfy a (quite general form of a) direct (sometimes 
referred to as a Jackson) estimate which quantifies their 
approximation properties, and in addition satisfy an inverse 
(referred to as Bernstein) estimate, which quantifies the reg- 
ularity of these spaces, then an L,(&Q)-norm equivalences 
of the form (17) hold. One actually obtains norm equiva- 
lences for a whole range of smoothness spaces, (possibly 
weighted) Sobolev spaces, around L,; a fact that is actu- 
ally more important for the intended applications (Dahmen, 
1996; Dahmen and Stevenson, 1999). 

The above approach, in particular (Dahmen and Steven- 
son, 1999), can be viewed as a special realization of the 
following general strategy. To describe this approach, it 
is now convenient to view a (countable) collection © of 
functions, such as a wavelet basis or a basis of scaling 
functions, as a column vector based on some fixed but 
unspecified ordering of its elements. Refinement relations 
of the form (14) take then the form 7 = 7, M; o where 
the columns of the matrix M; consist of the mask coef- 
ficients in each two-scale relation for the elements of the 
scaling functions on level j. For instance, in the case of 
the Haar basis on [0, 1], (4) says that each column in M; o 
has at two successive positions the value 271/2 as the only 
nonzero entry. This format persists in much wider gen- 
erality and can be used to represent two-scale relations 
for any hierarchy S of nested spaces spanned by scaling 
function bases S, = span®,. In the same way, a basis WY, 


Í 
f 
| 


spanning some complement W, of S, in S,,,, has the form 
WT = OFM, ,. It is easy to see that M, , completes Mjo 
to an invertible operator M; = Mo M; if and only if 
Sj+1 = 5; ® W; and that the complement bases are uni- 


formly scalewise stable in the sense of (18) if and only if 
the condition numbers of the M, with respect to the corre- 
sponding norms are uniformly bounded (Carnicer, Dahmen 
and Peña, 1996). Of course, in the case of orthonormal 
bases one has G, = M/. 

One can now define multiscale transformations that 
change, for instance, the representation of an element in S; 
with respect to ®, into the representation with respect to 
®, and the complement bases W,, j < J, in complete anal- 
ogy to (8) and (9). In fact, the refinement relations imply 
that 


ciH! = (Mjo +M,.@/) (20) 


The refinement matrix M, o can he viewed as a prediction 
operator. When the detail coefficients are zero, it provides 
an exact representation of the data on the next higher level 
of resolution. It is also sometimes called a subdivision 
operator (see e.g. Arandiga, Donat and Harten, 1998, 1999, 
Carnicer, Dahmen and Pefia, 1996). It follows from (20) 
that the transformation T7’, taking the detail coefficients 
into the single-scale coefficients c”, uses the matrices M; 
in each cascadic step of (9). , 
Conversely, setting G; := M;! = (¢’°). one has 


ef = Got! and di = Go (21) 


Hence the transformation T, that decomposes c” into 
details dÍ and coarse scale coefficients c? has the same cas- 
cadic structure as (8), now based on the filter matrices G;. 

Furthermore, one can show that the transformations T, 
have uniformly bounded spectral condition numbers inde- 
pendent of J if and only if the corresponding union W of the 
complement bases Y, and the coarse scale basis ®ọ forms a 
Riesz basis for L, (Dahmen, 1994; Carnicer, Dahmen and 
Peña, 1996). 

While it is often difficult to directly construct a Riesz 
basis for the space of interest, in many cases, it is easy 
to find for each level j some initial complement bases 
ý. For instance, when working with a hierarchy of nodal 
finite element bases, complement bases are provided by 
the kierarchical basis consisting of those nodal basis func- 
tions at the nodes of the next higher level of resolu- 
tion (see e.g. Yserentant, 1986). As a second step one 
can then generate from this initial multiscale decompo- 
sition another one that has certain desirable properties, 
for instance, a higher order of vanishing moments. The 
important point to be made in this regard is that all of 
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this can be done completely on a discrete level. To this 
end, suppose that an initial completion M,, of the refine- 
ment matrix M; (and G)) is known, Then ail other stable 
completions have the form (Carnicer, Dahmen and Peña, 
1996) 


My. = Mol + Mj:K (22) 


with inverse blocks 


Gjo = Ğjo -Č KDL, Gu =% K (23) 


In fact, this follows from the identity 


“a ay (1 L\ fl -EK\@ _. 
1=%č; =M (0 A i = Jő; =MG; 


The special case K = I is often referred to as Lifting 
Scheme, (Sweldens, 1996, 1998). The parameters in the 
matrices L,K can be used to modify the complement 
bases. Such modifications of stable completions are used 
for instance in the construction of wavelets on an inter- 
val (see e.g. Dahmen, Kunoth and Urban, 1999) and hence 
in the above-mentioned domain decomposition approach 
(Canuto, Tabacco and Urban, 1999, 2000; Cohen and Mas- 
son, 1997; Dahmen and Schneider, 1999a), as well as in the 
construction of finite element based wavelets through coarse 
grid corrections (Carnicer, Dahmen and Pefia, 1996; Dah- 
men, 1997; Dahmen and Kunoth, 1992; Dahmen and 
Stevenson, 1999; Stevenson, 2000; Vassilevski and Wang, 
1997). A further important application concerns raising the 
order of vanishing moments: Choose K = I and L such 
that 


T T = iT 
[vpeas= [orama [afer 


+P Pdx=0, PEP» (24) 


The significance of the above formulations lies in the 
versatility of handling multiscale decompositions entirely 
on a discrete level. This allows one to circumvent (at least 
to some extent) the explicit construction of complicated 
basis functions (see Harten, 1996). However, statements 
about stability are often based on explicit knowledge of the 
underlying multiscale bases. 


2.4 The key features 


The ideas put forward in the previous section allow one 
to construct multiscale bases for a variety of domains and 
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even closed surfaces. In this section, we collect the main 
properties of these constructions that are valid on a domain 
Q of spatial dimension d. We shall assume these properties 
in our subsequent applications. These key properties can be 
summarized as follows: 


e Locality (L) e Cancellation Properties (CP) 
e Norm Equivalences (NE). 


Locality: (L) means that the elements of Y all have 
compact support S, := supp iy, that scales properly, that 
is, 

diam (S,) ~ 2-™! (25) 


Locality is crucial for applications on bounded domains and 
for the efficiency of associated multiscale transforms. 


Cancellation Properties: (CP) generalizes our earlier 
observation (12). It means that integrating a wavelet against 
a locally smooth function acts like differencing. Assume 
for example that the wavelets are normalized in L3, that is 
liliz, ~ 1- Cancellation will then mean that 


— n+(d/2)—-(d, a 
Iv, a)l S EDE olgin AET (26) 


where |vl waa) is the usual nth order seminorm of the 
corresponding Sobolev space on the domain G. Analogous 
relations can of course be formulated for the dual basis wv. 

The integer m signifies the strength of the cancellation 
properties because it says up to which order the local 
smoothness of the function is rewarded by the smallness 
of the coefficients (in this case of the dual expansion). 
Obviously, when & is a Euclidean domain, (26) implies 
that the wavelets have vanishing polynomial moments of 
order m, that is, 


(P, Yaa =l, P EPa KE Ty (27) 


Conversely, as in (12), the vanishing moments imply that 
for [(i/p) + G/p = 1 


tu, pi = int |v- P, da) 
< pn lv — Pletal 
LINADE pa lv- Pils 
where we have used that 


nid /p—(4/2)) no oIMd/2)—-(4/p)) 
lialz, ~ 2e ohca/2)-4/P)) when 


Ivy lle, ~ 1 (28) 


Now standard estimates on local polynomial approximation 
(see e.g. DeVore and Sharpley, 1984) tell us that 


inf |v — P < (diam G)*|v| ye 
inf Iv — Pljic) $ Gam OY! eho) 


which yields (26). We refer to (26) as the cancellation 
property of order 7 rather than to (27), since it makes sense 
for domains where ordinary polynomials are not defined. 


Norm Equivalences: The cancellation properties tell us 
under what circumstances wavelet coefficients are small. 
One expects to have only relatively few significant coeffi- 
cients when the expanded function is very smooth except 
for singularities on lower dimensional manifolds. This helps 
to recover a function with possibly few coefficients only if 
small perturbations in the coefficients give rise to pertur- 
bations of the function that are also small with respect to 
the relevant norm. Recall that for function spaces ¥ with 


local norms, it is usually easy to construct multiscale bases ~ 


W that are uniformly scalewise stable, that is, 


Eal ~ apel (29) 
Sy y 
uniformly in j, where ||- || is some appropriate discrete 


norm. 

In some cases, this stability property can be extended to 
the whole array W over all scales. In the particular case 
when X = H is a Hilbert space, this is expressed by saying 
that, with the normalization llya lla ~ 1, the family V is a 
Riesz basis for the whole function space H, that is, every 
element v € H possesses a unique expansion in terms of Y 
and there exist finite positive constants cy, Cy such that 


<Cy l) lle,» 
H 
Vv = (v,) € fy (30) 


cylle, < 


Thus, while relaxing the requirement of orthonormality, 
a Riesz basis still establishes a strong coupling between 
the continuous world, in which the mathematical model is 
often formulated, and the discrete realm which is more apt 
to computational realizations. Therefore, it should not be 
a surprise that the availability of such bases for function 
spaces may be exploited for numerical methods. 

We shall exploit norm equivalences for the problem 
class (2) where the relevant spaces are Sobolev spaces or 
tensor products of them. Recall that for n € N the space 
H”(Q) consists of those elements of L,(@) whose nth 
order weak derivatives are also in L,(Q). More generally, 
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for 1 < p < œ, we have 
WED) = (f:d"f eL, al <n} BV) 


and the corresponding (semi-)norms are given by 
lwo = Eon lF cq)? and olio = 
Falf waca Dealing with traces of functions on bound- 
ary manifolds, for instance, forces one to consider also non- 
integer smoothness orders t € R. For t > 0, these spaces 
can be defined either by interpolation between spaces 
of integer order (see e.g. Bergh and Löfström, 1976) or 
directly through intrinsic norms of the form 


lvi WE) 


= [lela + TiL 


laj=n 


1/p 


latv) — 3v) diya 


|x - ylere—n)p 
n:= |t] 


Moreover, for Lipschitz domains Q and T C 0, we denote 
by Hi r(9) the closure of those C% functions on Q with 
respect to the H‘-norm that vanish on T. We briefly write 
Hi (Q) when T = 9Q. We refer to Adams (1978) for more 
details on Sobolev spaces. 

In the sequel for t > 0, H* will denote some closed 
subspace of H'(Q) either of the form Hj(Q)C H' S 
H'(Q) or with finite codimension in H‘(Q). For t <0, 
we define H’ as the dual space H' = (HY. Starting 
with a suitable wavelet Riesz basis Y for H = L,(&), a 
whole family of realizations of (30) can be Seiad as 
follows. There exist positive constants y, ? > 0 (depending 
on the regularity of the wavelet basis) with the following 
property: For s €e (—j, y), there exist positive constants 
c,, C,, such that every v € H" possesses a unique expansion 
v = Dye, 27 hp, such that 


Doy, 


het 


CDs lle, =< £ Codda (32) 


He 
Thus properly scaled versions of the wavelet basis Y 
for L, are Riesz bases for a whole range of smoothness 
spaces, including of course s = 0 as a special case. This 
range depends on the regularity of the wavelets. In many 
constructions, one has y = 3/2 corresponding to globally 
continuous wavelets (Canuto, Tabacco and Urban, 1999, 
2000, Cohen and Masson, 1997; Dahmen and Schneider, 
1999a; Dahmen and Stevenson, 1999). 

Establishing (32) works actually the other way around. 
It is usually easier to verify (32) for positive s. This can 
be derived from the validity of Bernstein and Jackson 
estimates for the primal multiresolution sequences only. 
If one can do this, however, for the primal and for the 


dual multiresolution sequences associated with a dual pair 
of multiscale bases Y, Ñ, (32) follows for the whole range 
of regularity indices s by an interpolation argument (see 
e.g. Dahmen, 1996; Dahmen and Stevenson, 1999). In 
particular, this says that the Riesz basis property for L,(Q) 
follows from that of scaled versions of Y, W for positive 
Sobolev regularity (see also Cohen, 2000, 2003; Dahmen, 
1997, 2003). 


Remark 1. We emphasize the case (32) because it implies 
further relations that will be important later for robustness. 
To describe these, recall our convention of viewing a 
collection © of basis functions sometimes as a vector whose 
entries are ordered in a fixed but unspecified way. Ordering 
the wavelet coefficient arrays in a natural way, we can write 
Dre h2 Mp, = DY where (D := (28, .), ,) 
and v:= (v heg In the problem class (2), one often 
encounters Hilbert (energy) spaces endowed with a norm of 
the type lvl}, := €(Vu, Vu) + (v, v). The performance of 
multilevel preconditioners for such problems often depends 
on €. It will be seen that a remedy for this can be based 
on robust equivalences of the following form that can be 
derived from (32) for s =0 and s = 1 (Cohen, Dahmen 
and DeVore, 2001; Dahmen, 2001): assume that y > 1 and 
define the diagonal matrix D, := (1 + /@2™)8, wher 
Then 


4 a \-1/2 J 
(e aD) Wie < IDE Y 


1/2 


< (C3 +C}) ^ livile, G3) 


We wish to conclude this section with the following 
remarks concerning duality; see, for example, Dahmen 
(2003) for more details. As indicated before, the known 
constructions of a wavelet basis W, that satisfy norm equiv- 
alences of the form (32), involve to some extent the simul- 
taneous construction of a dual basis Ọ. Conversely, the 
existence of such a dual basis is actually a consequence of 
the Riesz basis property in the following sense. It is not 
hard to show that the validity of (30) implies the existence 
of a collection & C H’ such that (4, b,) = èv where 
(-, +} is the duality pairing that identifies the representation 
of H’. Moreover, Ù is a Riesz basis for H’, that is, 


Cy llwlle, SW? Flw < cg Iwa wee, GA 
This will mainly be used in the equivalent form 
CHMY, le, < loli < cg KY, le, BS) 


where we have abbreviated (Y, v) := ((,, v): X € J)?. 
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In particular, if we want to construct a Riesz basis for 
L,, then the dual basis (with respect to the L,-inner prod- 
uct as a dual pairing) must also be a Riesz basis in Lo, in 
agreement with the above remarks concerning (32), This 
rules out the practically convenient so-called hierarchical 
bases induced by interpolatory scaling functions since the 
dual basis is essentially comprised of Dirac distributions. 
An important further example is H = H} (Q) for which, 
according to (32), a Riesz basis is obtained by renormal- 
izing the functions , with the weight 27™, which also 
amounts to redefining V as D7! W. In this case, (35) offers 
a convenient way of evaluating the H~!-norm which is use- 
ful for least squares formulations of second order elliptic 
problems or their mixed formulations (Dahmen, Kunoth and 
Schneider, 2002). Note that in this particular case, unlike 
the situation in Remark 1, need not be a Riesz basis 
for L}. i 


2.5 Wavelets and linear operators 


So far we have focused on wavelet representations of 
functions. For the problem class (2), in particular, it will be 
important to deal with wavelet representations of operators. 
In this section, we collect a few important facts concerning 
linear operators that follow from the above features. As 
a simple guiding example consider Poisson’s equation on 
some bounded domain Q c RÊ 


—Au=f in Q, u=0 on T:=əQ (36) 


It will be crucial to interpret this equation properly. Mul- 
tiplying both sides of (36) by smooth test functions that 
vanish on T, and integrating by parts, shows that the solu- 
tion u satisfies 


(Vv, Vu) = (v, f) for all smooth v (37) 
% 
where (v, w) := fo vw dx. However, this latter form makes 
sense even when u belongs only to the Sobolev space 
Hj(Q) (recall Section 2.4) and when the test functions 
also just belong to HA(Q). Moreover, the right hand side 
makes sense whenever f is only a distribution in the dual 
H- (Q) of Hi (9). Here (-,-) is then understood to be the 
dual form on H} (8) x H~!(Q) induced by the standard 
L,-inner product. Thus, defining the linear operator A by 
(Vv, Vu) = (v, Au) for all v,u € H} (Q), the boundary 
value problem (36) is equivalent to the operator equation 


Au= f (38) 
where, roughly speaking, A is in this case the Laplacian 


(with incorporated homogeneous boundary conditions), tak- 
ing HE (Q) into its dual H~!(Q). 


The Standard Wavelet Representation: Suppose now 
that as in the Laplace case described above, we have a linear 
operator A: H > H'’ and that Y is a Riesz-basis for H, 
that is, (30) holds. Then for any v = 5°, v, y, € H one has 


Av = rth, Arh, = 1 (Ao) 


A » 


=E(L ty. A) 
» v 


Thus the coefficient array w of Av € H’ with respect to 
the dual basis Y is given by 


w= Av where A:= (tẹ AW)» » v=), 
(39) 
The above example (36) is a typical application where Y 
and are biorthogonal Riesz bases in the Sobolev space 
H = HÈ (Q) and its dual H’ = H~!(Q) respectively. 

A is often referred to as the standard wavelet representa- 
tion of A. Note that in conventional discretizations such as 
finite elements and finite differences, the operators can usu- 
ally only be approximated. A basis allows one to capture, 
at least conceptually, all of the full infinite dimensional 
operator, a fact that will later be seen to have important 
consequences, 


Well-Posedness and an Equivalent £2-Problem: We 
say that an operator equation of the form (38) is well posed 
if (either A maps H onto H’ or f € range (A) and) there 
exist positive constants c4, C4 such that 


callully < Avi < Cyllull, forall veH (40) 


Here H’ is the dual of H endowed with the norm 


(v, w) 


lwl := sup 
W peh Molly 


(41) 
and (-,-} is a dual form on # x H’ (which is induced 
as before by the standard inner product in some pivot L, 
space). 

Clearly (40) means that for any data f in the dual H’ - 
the range of A — there exists a unique solution u, which 
depends continuously on the data f. Thus well-posedness 
refers to continuity with respect to a specific topology given 
by the energy space H. It is not hard to show that in the 
case of Poisson’s problem (36), (40) is a consequence of 
H! -ellipticity 


(Vv, Vv) > clolino = lela + IYr o» 
(Vv, Vw} < Clvlag lwla (42) 


which in turn follows from Poincaré’s inequality. While 
in this special case the right space H = Họ (Q) is easily 
recognized, the identification of a suitable # such that (40) 
holds is sometimes a nontrivial task, an issue that will be 
taken up again later. 

An important observation is that, once the mapping 
property (40) has been established, the Riesz basis property 
(30) for the energy space allows one to transform the 
original problem into an equivalent one which is now 
well-posed in the Euclidean metric. This is of particular 
importance in parameter dependent cases such as the Hilbert 
space H, considered in the previous section. 


Theorem 1. Suppose that A: i> 1’ satisfies (40) and 
that ¥ is a Riesz basis for H, that is (30) holds. Let A 
denote the standard representation of A with respect to Y. 
Then (38) is equivalent to Au = f, where u = Fhe TW, 
f= (by J)ez Moreover, A is boundedly invertible on 
£, that is, 


chelle <Avile, < C4Calvle forall ve a 


Proof. By (30), one has for any v = $h nYa 
lvla < cgi lolly < cgc lvla 
-2,.- -2—1 
< cR Av) ,ealle, = cg ea Ave 


where we have used (34). The reverse estimate follows 
analogously m| 


3 EVOLUTION PROBLEMS — 
COMPRESSION OF FLOW FIELDS 


We shall now address the problem class (1) of evolution 
equations. In many relevant instances, the evolution of the 
quantity u(x, t) expresses a conservation law in the sense 
that for any test volume V in the domain © of interest, 


d 
Paste -nds =0 44 
a feet [Fe $ e3 


where f(u) is the flux function and n denotes the outward 
normal on the boundary 3V. When the solution is suffi- 
ciently smooth, (44) leads to a first order system of partial 
differential equations of the form 


3u + div, f(u) = 0 (45) 
which is said to be hyperbolic when the Jacobian matrix 


Df (x) has for all x real eigenvalues with a full basis of 
eigenvectors. Hyperbolic systems of conservation laws are 
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used to model phenomena as diverse as traffic, information, 
and fluid flows. Perhaps the most well-known example is 
the system of Euler equations which model compressible 
fluid flow in terms of balance equations for mass, momen- 
tum, and energy. Such systems have to be complemented 
by suitable initial/boundary conditions. For simplicity, we 
shall assume pure initial data up(x) = u(x, 0) with compact 
support. 

Numerical methods for solving (44) are typically based 
on evolving cell averages uc(t) := ou. t) dx)/ICl, 
where C runs over a partition P of the domain Q into 
disjoint cells. In fact, the balance relation (44) also reads 


= At 
olt + At) = üc) + Bc: 


= a es dx di (46) 
@ - t 
Bel): At I f Mpe i 


A finite volume scheme will mimic this time evolution 
by replacing the exact flux balance Bc(¢) by a numerical 
approximation computed from the current approximation of 
the exact cell average. More precisely, given a time step Ar, 
the scheme computes approximate values ug ~ Uc(nAt) 
according to 


l At a 
unt =Uc(t) + ici 5x e (47) 
C'ACH 


where Fë o is the numerical flux across the common 
boundary of C and an adjacent cell C’ for the time interval 
[nAt, (n + 1) At]. This numerical flux typically depends on 
the values u% and uč, and possibly on other neighboring 
values. We shall always assume that the scheme is con- 
servative, that is, Fë œ = —FG,c. The initialization of the 
scheme uses the exact (or approximately computed) aver- 
ages u% := (fo u(x) dx)/|C| of the initial data uo. 

Denoting the array of cell averages by u" = (Uub)cep: 
the finite volume scheme is thus summarized by a one step 
relation 


u"*! = Eu” (48) 


where E is a nonlinear discrete evolution operator. The 
computationally expensive part of this numerical method 
is the evaluation of the numerical fluxes Fé cç» which 
is typically based on the (approximate) solution of local 
Riemann problems. An a priori error analysis of these 
classical numerical methods is only available to a limited 
extent. It refers to scalar problems, not to systems, and is 
rigorously founded only for uniform meshes. The proven 
error estimates are of low approximation orders like /’/? 
where h is the mesh size, that is, the maximal diameter of 
the cells, 
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It is well-known that the solutions of hyperbolic con- 
servation laws exhibit a highly nonhomogeneous structure. 
Discontinuities in the solution can develop after finite time 
even for arbitrarily smooth initial data. So the solution 
exhibits regions of high regularity separated by regions of 
discontinuities (shocks). Capturing the singular effects in 
the solution by using classical discretizations based on uni- 
form (or even quasi-uniform) partitions into cells would 
require a very fine resolution near the singularities and thus 
lead to enormous problem sizes. We see that the nature 
of the solution begs for the use of adaptive methods which 
would give finer resolution in the regions of shock disconti- 
nuities and maintain coarser resolution otherwise. The usual 
numerical approach is to generate such partitions adap- 
tively. The difficulties in such an approach are to determine 
these regions and properly perform the time evolution on 
these inhomogeneous discretizations, 

The analytic structure of solutions to (45) also points 
to possible advantages in using multiscale decompositions 
of the solution u in a numerical procedure. Because of 
the cancellation property (26), the coefficients of u would 
be small in those regions where the solution is smooth 
and would have significant size only near shocks. Thus, a 
multiscale decomposition would be excellent at identifying 
the regions of discontinuities by examining the size of the 
coefficients, and providing economical representations of 
the approximate solution at time nAr by an adapted set 
of wavelet coefficients (d¥) ea». The approximate solution 
would therefore be given by 


un = } ey, (49) 


eA" 


where the set A” is allowed to vary with n. The main dif- 
ficulty in this approach is the method of performing the 
evolution step strictly in terms of the wavelet coefficients. 
In other words, given A,, and the coefficients (df) ean» how 
would we evolve on these data to obtain a good approxima- 
tion at the next time step? This has led to the introduction 
of dynamically adaptive schemes in Maday, Perrier and 
Ravel (1991) in which the derivation of (A"+!, u,.+1) from 
(A", wan) typically goes in three basic steps: 


(i) Refinement: a larger set A”+! with A” C A"t! is 
derived from an a posteriori analysis of the computed 
coefficients d”, } € A”. 

Gi) Evolution: a first numerical solution Ugan = 
Trego dith, is computed from u, and the data 
of the problem. 

(iii) Coarsening: the smallest coefficients of u,,, are 
thresholded, resulting in the numerical solution 
Uai = Ereann 27", supported on the smaller 
set A”+H c Ant, 


A few words are in order concerning the initialization of 
the scheme: ideally, we can obtain an adaptive expansion 
upo of the initial value data ug into a linear combination 
of wavelets by a thresholding procedure on its global 
expansion, that is, 


uno = $ dhi AP = fA s.t. dilien} (50) 


KEA? 


where ¥ is some prescribed norm in which we target 
to measure the error, ņ a prescribed threshold and d? := 
(Uo, w,) are the wavelet coefficients of uy. In practice, we 
cannot compute alt the values of these coefficients, and 
one thus needs a more reasonable access to a compressed 
representation. This is typically done through some a priori 
analysis of the initial value ug. In particular, if ug is 
provided by an analytic expression, or if we have some 
information on the local size of its derivatives, estimates 
on the decay of wavelet coefficients, such as (26), can be 
used to avoid the computation of most details which are 
below threshold. With such a strategy, we expect to obtain 
A? and (u?) epo with a memory and computational cost 
which is proportional to #(A°). ` 

Then, assuming that at time n Aż, the approximate solu- 
tion Uaa has the form (49) for some set A” of coeffi- 
cients, the problem is thus both to select a correct set of 
indices A”+! and to compute the new coefficients d#+' 
for à € A"t!. As we have already explained, this is done 
by (i) refining A” into an intermediate set A”*! which is 
well fitted to describing the solution at time (n+ 1)At, 
(ii) computing ux. supported by A®*! and (iii) deriving 
(Uan, A"*?) from uzan by a thresholding process. The 
selection of the intermediate set A"+! should thus take into 
account the effect of the evolution operator € on the sparse 
expansion (49), integrated between nAt and (n + 1)At. 
Once a procedure for the refinement of A” into Att! has 
been prescribed, several strategies are available for com- 
puting uz. from u an, such as Petrov—Galerkin methods 
in Maday, Perrier and Ravel (1991) or collocation meth- 
ods in Bertoluzza (1997). All these strategies are based 
on the computation of the inner products (E(u an), Ja) for 
» € A*+! up to some precision. In the case where the evo- 
lution operator £ is linear, this amounts to a matrix-vector 
product, and one can make use of the sparse multiplication 
algorithm which will be discussed in Section 6. However, 
in many cases of interest, the evolution operator £ is non- 
linear, making this computation more difficult and costly. 
Generally speaking, the discretization of nonlinear opera- 
tors is a less simple task in the wavelet coefficient domain 
than in the physical domain. 

In the following, we shall present a systematic approach 
which allows us to solve this problem, by a suitable com- 
bination of the representations of the numerical solution 


by its wavelet coefficients and its physical values such as 
cell averages. This approach was first advocated by Ami 
Harten (Harten, 1993, 1995). The idea of Harten is to use 
multiscale decompositions where they do well ~ namely, in 
finding the discontinuities in the solution at a given time, 
and to use classical finite volume solvers, based on cell 
averages for the evolution step according to (47) and (51), 
since the properties of these solvers are well understood. 
To accomplish this, we need to build cell averages into 
our multiscale structure. This is easily accomplished (as is 
detailed below) by using multiscale bases that use charac- 
teristic functions of cells as the dual scaling functions. This 
means that at any given time step, one can view the numer- 
ical solution through one of two microscopes. The one is 
the decomposition as a sum of characteristic functions of 
cells (on the finest level of decomposition); the other is the 
multiscale decomposition. The first is good for the evolu- 
tion; the second is good for identifying shocks and regions 
of smoothness. As described in Section 2.3, there are fast 
methods for transforming between the two sets of coeffi- 
cients (scaling coefficients and multiscale coefficients). 
Let us first amplify on our claim that cell averages lend 
themselves naturally to multiresolution techniques based on 
multilevel bases as described in Section 2.3. In fact, given 
a hierarchy of nested meshes and corresponding partitions 
(P;) x0 of the flow domain, the cell averages uç correspond 
directly to inner products (u, ¥¢)/|C| which suggests that 
the L,-normalized functions x-/|C| for C € P; play the 
tole of the dual scaling functions $ jų: In complete analogy 
to (4), the indicator functions xç/|Cį satisfy two-scale 
relations. The prediction operators have then the form (20) 
based on the refinement matrices M; o Similarly to (4) 
one can construct Haar-like orthogonal bases ọ,. Here, 
as we have described in Section 2.3, one has a choice 
in the construction of the complement bases. We shall 
see that there is an advantage in having more vanishing 
moments in the dual basis than is provided by the classical 
Haar decomposition. Since Haar type bases have only first 
order vanishing moments, recall (12), one cannot expect a 
significant data compression. Therefore, one can use (22), 
(23) (with K = I) to raise the order of vanishing moments 
of the dual wavelets N as explained in (24) at the end of 
Section 2.3 (see Dahmen, Gottschlich-Müller and Müller, 
2001; Müller, 2003). This amounts to changing the primal 
multiresolution spaces and in turn the primal wavelets , 
while the dual scaling functions remain defined as Xç/|C]. 
Given any array v; = (Uc)cep, of cell averages on the 
partition P}, we can transform this vector into the multi- 
scale format d” := (vp, do, ...,@_1), where the arrays d; 
encode detailed information needed to update the coarse 
cell averages in v; to v,,, On the next level of resolution. 


j j 
Thus the d vectors correspond to the multiscale coefficients. 
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It is important to note that in generating the multiscale 
coefficients, we do not need explicit information on the 
multiscale basis functions. The transformation T; that maps 
a cell average vector v; on to its multiscale decomposition 
d’, can be executed in an entirely discrete way as in the 
cascade algorithm of Section 2.3 (see also Arandiga, Donat 
and Harten, 1998, 1999; Carnicer, Dahmen and Peña, 1996; 
Sweldens, 1996, 1998). To go the other way, from multi- 
scale coefficients to the scaling coefficients, we again use 
the cascade structure. Recall that scaling coefficients v,,1, 
for a resolution level j + 1, are obtained from the scaling 
coefficients v; at the coarser level in a two step process. 
The first is to predict Viet by some rule (the lifting rule) 
and the second is to correct for the deviation in the pre- 
diction from the actual values. The deviation of the true 
coefficients V;1 from the predicted coefficients is given by 
the detail d,,;. 

Our adaptive numerical scheme will be designed as a 
combination of a reference finite volume scheme which 
operates at the finest resolution level J according to 


uit} = Eju} (51) 


and of the transformations T, and T7! that relate the cell- 
average vector w% and its multiscale coefficients (d})))<)—1- 
In an adaptive context, we want to encode only a small 
relevant portion of this vector corresponding to the adaptive 
set A”, Ideally, this set would correspond to the indices ^ 
such that 


Wav lle> n (52) 


where || - ||, is the norm in which we plan to measure the 
error and y is some prescribed threshold. In practice, we 
precisely want to avoid the encoding of uw} and of the 
full multiscale vector, and therefore we cannot invoke a 
thresholding procedure applied to the reference numerical 
solution. Therefore, we shall develop an adaptive strategy 
that iteratively computes some n-significant sets A” and 
multiscale coefficients (dž) ean Which might differ from 
those obtained by thresholding u%. One of our goals is 
to keep track of the error between uj and the adaptive 
solution v% which is defined as the reconstruction on the 
finest partition P, from the details (4) equ. 

At this stage, a key observation is that a restricted 
multiscale vector (d, ) ea exactly encodes the cell averages 
on an adaptive partition P(A) which includes cells of 
different resolution levels j =0,...,/ as illustrated in 
Figure 4, provided that the set A has a graded tree structure. 
Such a tree structure ensures that all the detail coefficients 
which are necessary to reconstruct the exact average on a 
cell C € P(A) are contained in A. Note that a graded tree 
structure is not guaranteed on an adaptive set A produced 
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(a) (b) 


Figure 4. (a) Adaptive mesh (b) Tree. 


by a thresholding procedure, yet it can be ensured by a 
suitable enlargement of A. In addition, it can be seen that 
the graded tree structure induces a grading property on the 
partition P(A) which essentially means that two adjacent 
cells differ at most by one resolution level. The concepts 
of tree structure and grading will also serve in Section 6 
in the context of nonlinear variational problems. Another 
key observation is that the cost of the transformation T, 
which maps the cell averages (uc) cepcq) onto the restricted 
multiscale vector (d,),., is of the order #(A) and similarly 
for the inverse transformation Tes A detailed description 
of such techniques and the design of appropriate data 
structures can be found in Mliller (2003). 

Based on this observation, we can propose the following 
adaptive scheme which follows the same principles as the 
dynamically adaptive scheme introduced in Maday, Perrier 
and Ravel (1991): 


(1) Initial values: Apply the multiscale transform T, to 
the initial cell averages u9 to obtain the array of detail or 
wavelet coefficients ns y Gncluding the cell averages 
on the coarsest level) for the time level‘: = 0. Choose a 
threshold parameter n > 0 and set A? to be the smallest 
graded tree containing those such that ||d?p, lly > n. 


(2) Predicting the significant indices on the next time level: 
Given the y-significant tree A” for the time level n and the 
details (d?) ean, predict a set A"+? that should contain the 
n-significant graded tree for time level n +1. We extend 
the detail vector by setting d? = 0 for } € A"*+? \ A” and 
we derive the cell averages (UG) cepijvny by applying the 
adaptive multiscale transform Tes 
(3) Time evolution step: Compute the evolved cell aver- 
ages (Cages {n+ at time n + 1, by some discrete evolu- 
tion operator to be specified later. Of course it is important 
that this evolution can be done at less cost than would be 
necessary to evolve the uncompressed data. 


(4) Reverse transform and thresholding: Apply the 
localized transform Tj.+: to OE cenim) which yields 
an array of detail coefficients (df*"), ean. Set A"*! to 
be the smallest graded tree containing those such that 
Hat* yp, lly >n, Set 2 +1 to n and go to (2). 

Any concrete realization of this scheme has to address 
the following issues. 


(1) Choice of the norm || - ||, and of the threshold param- 
eter n; 

(2) Strategy for predicting the set A"*!; 

(3) Specification of the evolution operator. 


Regarding (1), since the norm || - ||, will typically mea- 
sure the deviation between the reference and adaptive solu- 
‘tion u% and v}, a relevant choice for this norm should be 
such that we already have at our disposal an error estimate 
between the reference solution uw, and the exact solution at 
time n At in the same norm. As we shall see further, it will 
also be important that the reference scheme is stable with 
respect to such a norm. In the context of conservation laws, 
this limits us to the choice ¥ = L,. For a discrete vector u; 
indexed by the finest partition P}, we define |ju,||,;: as the 
L; norm of the corresponding piecewise constant function 
on P}, that is, 


aziz, = >. Clll (53) 


CEP; 


Assuming that the i, are normalized in L}, it follows 
from the triangle inequality that the error e, produced by 
discarding those multiscale coefficients of u’ satisfying 
lld liz, < 9 is bounded by 


as Dy 


Idli, Sn 


n= nr: dW, lz, <n} (54) 
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Since the above sum is limited to |A] < J— 1, we can 
derive the estimate 


e, SHP mEn 65) 


where d is the spatial dimension of the problem. It follows 
that a prescribed thresholding error 8 can be obtained by 
using a threshold of the order 


n~ rs (56) 


Since the dual scaling functions and wavelets are normal- 
ized in L}, the primal scaling functions and wavelets are 
normalized in Z, so that |li liz: ~ 27¢™. Therefore, the 
above strategy corresponds to applying to the coefficients 
d, a level dependent threshold np; with 


ny ~ 2-5 (57) 


Note however that the estimate (55) is somehow pessimistic 
since some thresholded coefficients d, could actually be 
much smaller than nj- 

Concerning (2), an ideal prediction should take into 
account the action of the reference scheme E, on the 
adaptive solution in the sense that the detail coefficients 
of Ev% which are not contained in A"+! are guaranteed to 
be below the threshold. A strategy for constructing A"*! 
was proposed by Harten, based on a heuristic argument 
concerning the finite propagation speed of information in 
hyperbolic problems. Basically, A"+! is formed as the 
union of certain fixed neighborhoods (on the same or at 
most one higher scale) of the elements in A”. Recently, 
at least for scalar problems, a rigorous analysis has been 
presented in Cohen et al. (2002) which gives tise to sets 
A*+! that are guaranteed to fulfill the above prescription. 
In this case, the neighborhoods are allowed to depend in 
a more precise way on the size of the elements in A”. 
In practical experiments, Harten’s simpler choice seems to 
have worked so far well enough though. 

Turning to (3), several strategies are available for evolv- 
ing the cell averages (vé) cepAntty into (vet )cepiqnty- The 
first one consists im computing the effect on these aver- 
ages of the exact application of the reference scheme E, to 
the adaptive solution vj reconstructed on the fine grid. A 
key observation is that since we are only interested in the 
averages of E,v% on the adaptive partition P(A"*?), the 
numerical fluxes which need to be computed are only those 
between the adjacent fine cells such that their interface lies 
on the edge of a cell of the adaptive partition. In the origi- 
nal concept proposed by Harten, this idea was exploited in 
order to obtain CPU savings on the number of flux evalua- 
tion, with the solution encoded in its nonadaptive form vj. 
In Cohen et al. (2002), it was shown that the computation 
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of the needed fluxes can be performed from the adaptive 
data (v2"")cep int) Without the need of performing the 
reconstruction of the entire v}. This information can indeed 
be acquired by local reconstruction. However, in several 
space dimensions, the resulting computational complexity, 
although still lower than that for the fully refined partitions, 
is suboptimal. A second, more economical strategy is to 
employ the finite volume stencil of the uniform partition 
but for the currently local level of resolution. This makes 
use of the local quasi-uniformity of the mesh which can be 
made locally uniform by subdividing neighboring cells of 
lower generation. The gradedness of the partitions ensures 
that the subdivisions need only have depth one. In numeri- 
cal experiments, this strategy turns out to work well when 
using higher order finite volume schemes in connection with 
corresponding higher order multiscale decompositions, here 
corresponding to the higher order vanishing moments (see 
e.g. Müller, 2003). 

One of the good features of the adaptive approach that 
we have described is the possibility to monitor the error 
between the reference and adaptive numerical solution by a 
proper tuning of the threshold parameter. Here, we consider 
the evolution strategy that amounts to computing exactly 
the averages of E,v’ on the adaptive partition P(A"*"). It 
follows that we can write 


just! — vty, = Eu) -Ejvjll,, + Patt, (58) 


where 


P= >. AEDA (59) 


ng Att 


and 


i= D 


neAntl\ ant 


Ia EyvVa lle, (60) 


respectively denote the errors resulting from the restriction 
to the predicted set A"*! and to the set A”+! obtained 
by thresholding. According to our previous remarks, these 
errors can be controlled by some prescribed 8 provided 
that we use the level dependent threshold n; ~ 240-03, 
Assuming in addition that the reference scheme is L,-stable 
in the sense that for all u, and vz, 


IE;u; — E;vyllz, < (14+ CAt)\juy — vy lz, (61) 
we thus obtain 
ust? =$ ven, < |E,a5 - Ev; iz +28 (62) 


which yields the estimate at time T = nAt 


lu} — voll, < C(T)nè (63) 
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Therefore, if the reference numerical scheme is known to 
provide accuracy € = €; on level J, it is natural to choose 
8 such that nò ~ e. In many practical instances, however, 
this estimate turns out to be too pessimistic in the sense that 
thresholding and refinement errors do not really accumulate 
with time, so that 8 and the threshold y can be chosen larger 
than the value prescribed by this crude analysis. A sharper 
analysis of the error between the adaptive and reference 
solution is still not available. 

An adaptive solver based on the above concepts has been 
developed and implemented by S. Miiller. It also incorpo- 
rates implicit time discretizations. A detailed account can 
be found in Miiller (2003). The recently developed new 
flow solver QUADFLOW for hyperbolic conservation laws 
and for the Navier Stokes equations for compressible flows 
js based on these adaptive multiresolution techniques, on 
a finite volume discretization that can cope with hanging 
nodes and on a mesh generator based on block partitions. 
Each block corresponds to a B-spline based parametric 
mapping that allows a flexible mesh refinement through 
point evaluations of the B-spline representation. An out- 
line of the scheme and extensive numerical tests can be 
found in Bramkamp et al. (2001) and Ballmann, Bramkamp 
and Müller (2000). The numerical examples provided by 
S. Muller and F. Bramkamp should give an impression of 
the performance of such techniques. The first example in 
Figure 5 shows the results for an Euler computation con- 
cerning a flow at Mach 0.95 at an angle of attack a = 0 
around a bench mark NACA00012 profile. Here the main 
objective is to test the resolution of shock interactions even 
at a large distance from the airfoil. The mesh has approx- 
imately 5 x 10* cells as opposed to an estimated number 
of 7 x 10’ cells needed by a uniformly refined mesh for a 
comparable target accuracy. A close up of Figure 5(b) is 
displayed in Figure 6. 

Figure 7 shows a series of adaptive refinements again 
for an Euler computation for a flow around a BAC 3- 
11/RES/30/21-Profile at M = 0.85 and an angle of attack 
a = 0°, This test illustrates the reliable detection even of 
small shocks here in the lower region of the nose. Further 
detailed numerical studies for instationary problems such as 
moving wings, shock-bubble interactions, analogous studies 
for viscous flows and boundary layer resolution can be 
found in Bramkamp er al. (2001), Ballmann, Bramkamp 
and Miiller (2000), and Miiller (2003). 


3.1 Concluding remarks 
The above framework is an example where multiscale bases 


for realistic geometries are conveniently realized. In spite 
of the promising numerical results, it should be stressed 
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Figure 6. Adaptive mesh — close up. 


though that many principal questions remain open, due to 
a number of obstructions. First, the current understanding 
of error analysis for hyperbolic problems is much less 


1024 cells, Lmax = 1 


7405 cells, Lmax=4 


2 8288 cells, Lingx=9 


Figure 7. Adaptive mesh refinement. 


developed than for elliptic problems, partly due to the 
nature of the relevant function spaces. On one hand, there 
are only poor a priori estimates that could serve as a 
bench mark. The central point of view is a perturbation 
analysis. The overall attainable accuracy is fixed by the a 
priori choice of a highest level J of spatial resolution. All 
subsequent attempts aim at preserving the accuracy offered 
by a uniformly refined discretization with that resolution 
at possibly low cost. Thus whatever information is missed 
by the reference scheme cannot be recovered by the above 
adaptive solver. 

On the other hand, the multiscale techniques did not 
unfold their full potential; one makes use of the cancellation 
properties of wavelet bases but not of the norm equivalences 
between wavelet coefficients and functions. Thus the pri- 
mary focus here is on the compression of the conserved 
variables, that is, on the sparse approximation of functions 
based on cancellation properties. This does so far not pro- 
vide any estimates that relate the achieved accuracy € to 
the size of the n-significant trees. This question will be 
addressed later, again in a different context, where stronger 
basis properties allow one to exploit not only the sparse 
approximation of functions but also that of the involved 
operators. 


4 BOUNDARY INTEGRAL 
EQUATIONS — MATRIX 
COMPRESSION 


A variety of problems in elasticity, fluid flow or electro- 
magnetism lead to a formulation in terms of boundary 
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integral equations falling into the category (2), In princi- 
ple, this is a feasible approach when the Green’s function 
of the underlying (linear) partial differential equation is 
known explicitly. This is a particularly tempting option 
when the original formulation via a PDE refers to an exte- 
rior and hence unbounded domain since the corresponding 
boundary integral formulation lives on a compact mani- 
fold of lower spatial dimension. Wavelet concepts have 
had a significant impact on this problem area (see Dah- 
men, Harbrecht and Schneider, 2002; Dahmen, Prößdorf 
and Schneider, 1994; Harbrecht, 2001; Lage -and Schwab, 
1998; Lage, 1996; von Petersdorff and Schwab, 1996; von 
Petersdorff and Schwab, 1997; Schneider, 1998), primarily 
in finding sparse and efficient approximations of potential 
operators. We shall describe this in the following simple 
setting. 


4.1 Classical boundary integral equations 


Let Q7 be again a bounded domain in R? (d € {2, 3}) and 
consider Laplace’s equation 


—Aw =0, nQ, (Q=Q or Qt =R \ 2) 
(64) 
subject to the boundary conditions 


w=f onl:=dQ7 (w(x) > 0, |x] > 00 when 
Q=2*) (65) 


Of course, the unbounded domain 2+ poses an addi- 
tional difficulty in the case of such an exterior boundary 
value problem. A well-known strategy is to transform (64), 
(65) into a boundary integral equation that lives only 
on the manifold F = əN. There are several ways to do 
that. They all involve the fundamental solution E(x, y) = 
1/(4x|x — y|) of the Laplace operator which gives rise to 
the single layer potential operator 


(Au) (x) = OW) @) = i, x, yu), x eT (66) 


One can then show that the solution u of the first kind 
integral equation 


Vu=f on T (67) 


provides the solution w of (64) through the representation 
formula 


wa) = f EUO, reL 68) 
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An alternative way uses the double layer potential 


a 


KDa) = f FEE yw) ar, 
7 


1 ny@—y) 
= [ i eye dy, xer (69) 
where n, is the outward normal to I at y € I. Now the 
solution of the second kind integral equation 


Au:=(f4K)u=f (Q=Q*) (70) 


gives the solution to (64) through 
w(x) = Í K(x, yu(y) dF, (71) 


One way of reformulating (64) with Neuman boundary 
conditions dw/dn =g on T, where fp g(x) dr’, =0, is 
offered by the so-called hypersingular operator (Wv)(x) := 
—(0/dn,) Sp8/any EC, y) dl’. Now the solution of 


Au = Wu =g on P (72) 


leads to the solution of the Neuman problem for Laplace’s 

` equation through the representation formula (71), where the 
constraint fp u(x) dP, = 0 is imposed in the case of an 
interior problem to ensure uniqueness. 


4.2 Quasi-sparsity of wavelet representations 


We have encountered so far two classes: of operators. 
The first case such as (36) concerns differential operators 
that are local in the sense that (4, Ay) =0 whenever 
S, N S, = Ø. Note that the wavelet representation A is not 
sparse in the classical sense since basis functions from 
different levels may overlap. In fact, it is easy to see that 
the number of entries in a principal section of A of size 
N contains O(N log N) entries. However, we shall see that 
many of these entries are so smal! in modulus that they 
can be neglected in a matrix—vector multiplication without 
perturbing the result too much. The point of focus in this 
section is that this even holds true for the second class of 
global operators, which are roughly speaking inverses of 
differential operators such as the above boundary integral 
operators. They all share the property that they (or at least 
their global part) are of the form 


(nay = | Kæ yun ar, (73) 


where for a given domain or manifold I the kernel K(x, y) 
is smooth except on the diagonal x = y and satisfies the 
decay conditions. 


atab K(x, »)| Sdist(x, yy Ctt (74) 


By (39), the entries of A are in this case given by 


Ans = (K, Vy, 8 hirr = [ X K(x,y) 
x W@W) ar, dr, (75) 


Although none of the entries A, will generally be zero, 
many of them are very small in modulus as specified by the 
following classical estimate (see e.g. Dahmen, Prößdorf and 
Schneider, 1994; von Petersdorff and Schwab, 1996, 1997). 


Theorem 2. Suppose that the kernel K is of the above form 
and that D~*W is a Riesz-basis for H5 for —Ẹ < s < y (see 
(32)) and has cancellation properties (see (26)) of order m. 
Moreover, assume that A given by (73) has order 2t and 
Satisfies for some r > 0 


Aull pre Ss lull zeta, vE gg <lalsr (76) 


Then, for any o > 0 such that O < o < min{r,d/2+m+ 
t} t+o<y,andt—o > —ŷ, one has 


z-i- 


qoia SS 
aA (1+ 2minMiM) dist(S,, S) 2+2 
(T 


Thus the entries of the wavelet representation of operators 
of the above type exhibit a polynomial spatial decay, 
depending on the order of cancellation properties, and an 
exponential scalewise decay, depending on the regularity 
of the wavelets. 

For a proof of such estimates, one distinguishes two 
cases, When dist(S,, S,) S$ 27™0M D one can use the 
continuity properties (76) in combination with the norm 
equivalences (32) to show that 


Kao Aw,)| < UMEN gav~ (78) 


(see Dahlke etal. (1997) for more details). 

On the other hand, when dist(S,, S,) Z 271M), the 
wavelets are integrated against smooth parts of the kernel 
K. One can then exploit the cancellation properties for both 
wavelets to obtain the bound 


2-(M+@/2+4%) 


KW AWS e spaa (7) 


n N EAEE 


(see e.g. Dahmen, Prößdorf and Schneider (1994), Dahmen 
and Stevenson (1999), von Petersdorff and Schwab (1996), 
and von Petersdorff and Schwab (1997) for more details), 
The decay estimate (77) follows then from (78) and (79). 

However, the above argument for the case of overlapping 
supports is rather crude. Instead one can use the so-called 
second compression due to Schneider, (Schneider, 1998). 
In fact, when |X| >> |v] and when S, does not intersect the 
singular support of yy, then Ay, is smooth on the support 
of W, and one can again use the cancellation property of 
,. Denoting by Si, the singular support of (the lower level 
wavelet) y, this leads to 


QIN/29—-|vIGn+d/2) 
I, AW] S ne, Se (80) 


Estimates of the type (77), (78), and (80) provide the 
basis of matrix compression strategies that aim at replacing 
the wavelet representation of a operator by a sparsified 
perturbation that can be used to expedite the numerical 
solution of corresponding linear systems. 


4.3 Weak formulations and Galerkin schemes 


As in the case of Poisson’s equation (36) we are dealing 
again with an operator equation 


Au = f 81) 


this time of the type (66), (70), or (72). A classical approach 
to solving such an equation numerically is to return again to 


` a proper weak formulation on which to base a Galerkin dis- 


cretization. As before, the key is to identify first a suitable 
(Hilbert) space H such that the variational formulation 


a(v,u):= (v, Au) = (w, f) forall veH (82 


is well-posed in the sense of (40). In terms of the operator 
A, this can be rephrased by saying that A is boundedly 
invertible as a mapping from H onto H’, which will often 
be referred to as mapping property. 

All the above examples can be shown to fall into this 
framework (see e.g. Kress, 1989). The single layer potential 
is symmetric positive definite on the Sobolev space H := 
H-2(7) whose dual is H’ = H1/2(L), that is, 


a(v, v) = (v, Vo) 2 llvlly-waqy forall v e HT) 
(83) 
which is easily seen to imply (40). 
The double layer potential is known to be compact 
when IT is a C? manifold in which case the kernel is 
weakly singular. In general, the appropriate energy space 
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is H=L,(") =H’. Despite the lack of symmetry, one 
can show that the bilinear form a(-, -) is coercive and that 
(40) holds with H = LT). 

The hypersingular operator W, in turn, is strongly singu- 
lar with energy space H = H"? (T) (ie. H! = H71), 
respectively H = H™?(T)/R in the case of an interior prob- 
lem. Again, since it is then symmetric positive definite and 
(40) follows. 

According to the shifts caused by these operators in 
the Sobolev scale, 4: H'(T) > H~ (T), the single layer 
potential, double layer potential and hypersingular operator 
have order 2t = —1, 0, 1, respectively. 

The (conforming) Galerkin method (for any operator 
equation (64)) consists now in choosing a finite dimensional 
space S$ C H and determining u, € S such that 


alv, uç) = (v, f) forall ve S (84) 


Such a scheme is called (H-)stable (for a family S of 
increasing spaces S € S) if (40) holds on the discrete level, 
uniformly in S € S. In other words, denoting by Ps any H- 
bounded projector onto $, we need to ensure that (40) holds 
with A replaced by P} AP, uniformly in $ € S, where Ps 
is the adjoint of Ps. This is trivially the case for any sub- 
space S C H when A is symmetric positive definite. In the 
coercive case, one can show that Galerkin discretizations 
are stable for families S of trial spaces that satisfy cer- 
tain approximation and regularity properties formulated in 
terms of direct and inverse estimates, whenever the level of 
resolution is fine enough (see e.g. Dahmen, PréBdorf and 
Schneider, 1994). 

Once this homework has been done, what remains is 
choosing a basis for S by which (84) is turned into a linear 
system of equations. The unknowns are the coefficients of 
uy with respect to the chosen basis. 

The obvious advantage of the boundary integral approach 
is the reduction of the spatial dimension and that one has 
to discretize in all cases only on bounded domains. On the 
other hand, one faces several obstructions: 


G) Whenever the order of the operator is different from 
zero (e.g. for A = V), the problem of growing con- 
dition numbers arises because the operator treats high 
frequency components differently from slowly vary- 
ing ones. In general, if an operator has order 2t, the 
spectral condition numbers of the stiffness matrices 
grow like h~?"!, where h reflects the spatial resolution 
(e.g. the mesh size) of the underlying discretization. 

Gi) Discretizations lead in general to densely populated 
matrices. This severely limits the number of degrees 
of freedom when using direct solvers. But iterative 
techniques are also problematic, due to the fact that 
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the cost of each matrix/vector multiplication increases 
with the square of the problem size. 


One possible strategy to overcome these obstructions will 
be outlined next. 


4.4 Wavelet—Galerkin methods 


We adhere to the above setting and consider the operator 
equation (81), where A has the form (73), (74) and satisfies 
(40) for H = H’ (T). Suppose now that we have a wavelet 
basis which is a Riesz basis for H'’(I"), constructed 
along the lines from Section 2.3, with the corresponding 
multiresolution sequence of spaces S, spanned by all 
wavelets ,, [\| < j, of level less than j. We point out 
next how the use of the spaces S; as trial spaces in 
Galerkin discretizations can help to cope with the above 
obstructions. To this end, let Aj := ((, AVY) pyc; 
denote the stiffness matrix of A with respect to the (finite) 
wavelet basis of the trial space $ js Thus (84) takes the form 


Auy=f, f= Uh Mops (85) 


The first observation concerns obstruction (i) above (see 
e.g. Dahmen and Kunoth, 1992; Dahmen, Prößdorf and 
Schneider, 1994). 


Remark 2. If the Galerkin discretizations are H’-stable 
for S = {Sj} jen: then the spectral condition numbers of A, 
are uniformly bounded. 


In fact, when the bilinear form a(., -), defined in (82), is 
symmetric and H'-elliptic, so that A is symmetric positive 
definite, the spectrum of A, is contained in the convex 
hull of the spectrum of A, so that the assertion follows 
immediately from Theorem 1. Since under this assumption 
Galerkin discretizations are always stable for any choice 
of subspaces, this is a special case of the above claim. 
In the general case, the argument is similar to that in 
the proof of Theorem 1. In fact, Galerkin stability means 
that IAF lese <1, which ensures the existence of a 
constant @ such that ĉiiv lle, < WAjvjlle, for any v; € $; 
with coefficient vector vj. Moreover, by (43), 


lA ville = WC, Av Dpc le 


< lAv;la < CECalYylle, 
which confirms the claim. 

This observation applies to all our above examples of 
boundary integral operators. In fact, V and W are elliptic 
and the coercivity in the case (70) of the double layer 


potential ensures that (for j > jo large enough) the Galerkin 
discretizations are also stable in this case (see e.g. Dahmen, 
Prößdorf and Schneider, 1994). 

Thus, a proper choice of wavelet bases for the respective 
energy space deals with obstruction (i) not only for the 
second kind integral equations with zero order operators 
but also essentially for all classical potential operators. This 
is, for instance, important in the context of transmission 
problems. 

Of course, the preconditioning can be exploited in the 
context of iterative solvers for (85) only if the cost of a 
matrix/vector multiplication can be significantly reduced 
below the square of the problem size. First, note that, due 
to the presence of discretization errors, it is not necessary to 
compute a matrix/vector multiplication exactly but it would 
suffice to approximate it within an accuracy tolerance that 
depends on the current discretization error provided by S;. 
Thus, one faces the following central 


Task: Replace as many entries of A, as possible by zero $ 


so as to obtain a perturbed matrix Aj with the following 
properties for all the above operator types: ; 
(G) The Aj have still uniformly bounded condition num- 

bers when the level j of resolution grows. 

(ii) The solutions i, of the perturbed systems A ju) =f; 
have still the same order of accuracy as the solutions 
u; of the unperturbed systems (85), uniformly in j. 

(iii) Find efficient ways of computing the nonzero entries 
of Aj. 


These issues have been addressed in a number of 
investigations (see e.g. Dahmen, PréBdorf and Schneider, 
1994; Dahmen, Harbrecht and Schneider, 2002; von Peters- 
dorff and Schwab, 1996, 1997; von Petersdorff, Schneider 
and Schwab, 1997; Harbrecht, 2001; Schneider, 1998). We 
shall briefly outline the current state of the art as reflected 
by Harbrecht (2001) and Dahmen, Harbrecht and Schneider 
(2002). The key is a suitable, level-dependent thresholding 
strategy based on the a priori estimates (79) and (80). It 
requires a sufficiently high order of cancellation properties, 
namely m >m — 2t where m is the approximation order 
provided by the multiresolution spaces S;. Thus, whenever 
A has nonpositive order (such as the single and double 
layer potential operator), one must have m > m, ruling out 
orthonormal wavelets (in L,). Given that W meets this 
requirement, and considering a fixed highest level J of res- 
olution, fix parameters a, a’ > 1 and m’ € (m, ñm + 2t) and 
define the cut-off parameters (see Dahmen, Harbrecht and 
Schneider, 2002; Harbrecht, 2001; Schneider, 1998) 


c; ; = a max p, DOI=- AOH) | 


-J 


c 


ise a’ max pats 
J á 


a a aki (86) 


Then the a priori compression of A, is given by 


(Apa. 
0,  dist(S,, $) > Call and |A}, lv} > Jo; 
0,  dist(S,, S,) S 27M and 
= dist(S,, S,) > ch yy Hf lvl > [A= Jo~ 1, 
dist(S,, Se) > Chy if PAL > Ivf = Jo — 1 
(Ay) otherwise 


(87) 
The first line is the classical ‘first compression’ based on 
(79) when the wavelets have disjoint supports with a dis- 
tance at least the diameter of the larger support. The number 
of nonzero entries that remain after this compression is of 
the order N, log N, where N, := dim S, ~ 20-97 when 
d—1 is the dimension of I’. The second line reflects the 
‘second compression’ due to Schneider, which discards 
entries for wavelets with overlapping support (Schneider, 
1998). More importantly, this affects also those entries 
involving the scaling functions on the coarsest level jg. It 
has a significant effect when, due to a complicated geom- 
etry, the coarsest level already involves a relatively large 
number of basis functions. Asymptotically, it removes the 
log factor in the count of the nonzero entries of A}. 

A sophisticated perturbation analysis, whose main ingre- 
dients are the a priori estimates (79), (80) based on the can- 
cellation properties of W, the norm equivalences (30), and 
suitable versions of the Schur lemma, yields the following 
result (Dahmen, Harbrecht and Schneider, 2002; Harbrecht, 
2001). 


Theorem 3. The compressed matrices A p given by (87), 
have uniformly bounded condition numbers. The number of 
nonzero entries in A, is of the order Nj, uniformly in J. 
Moreover, the solution ti, exhibits optimal discretization 
error estimates in the energy norm 


lu — ü plg SPO lular J> (88) 


This result says that the above compression strategy is 
asymptotically optimal. In comparison with earlier ver- 
sions, the removal of log-factors even offers a strictly 
linear complexity. Note that for operators of negative order, 
the relatively high computational efforts for Galerkin dis- 
cretizations, due to the double integrals, pay off through the 
high order, m + |?\. 

The remaining crucial question concems the complexity 
of computing the compressed matrices A j- A detailed 
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analysis of this issue can be found in Harbrecht (2001); see 
also Dahmen, Harbrecht and Schneider (2002). The main 
facts can be summarized as follows: 

Of course, the entries of A z cannot be computed exactly 
but one has to resort to quadrature. The following nice 
observation from Harbrecht (2001) tells us how much 
computational effort can be spent on the entry (Az) v 
so as to keep the overall complexity of computing an 
approximation to A, proportional to the system size N}. 


Theorem 4. The complexity of approximately computing 
the nonzero entries of A zis O(N,), provided that for some 
a > Oat most O((F — (\d| + v1) /2)*) operations are spent 
on the computation of a nonzero coefficient (A Day 


Next, in analogy to the compression estimates, one can 
ask which further perturbation is allowed for the approx- 
imate calculation of the entries of A, so as to retain the 
above optimal convergence rates (Harbrecht, 2001; Dah- 
men, Harbrecht and Schneider, 2002). 


Theorem 5. If the quadrature error for (A pry is boun- 
ded by 


Siin [yenin A ie | 


x Qa tg 2m! I UMY) 


for some fixed 8 < 1, then the (perturbed) Galerkin scheme 
is stable and converges with optimal order (88) 


The main result now is that the accuracy bounds in 
Theorem 5 can be met by a sophisticated adapted quadra- 
ture strategy whose computational complexity remains also 
in the operations budget given by Theorem 4. Thus, 
in summary, one obtains a fully discrete scheme that 
exhibits asymptotically optimal computational complexity 
that remains proportional to the problem size. This is illus- 
trated below by some numerical examples. 


Remark 3. Another approach to matrix compression, first 
pointed out in Beylkin, Coifman and Rokhlin (1991), uses 
the so-called nonstandard form of the operator As := 
P;APs, which involves a telescoping expansion for As 
but is not a representation of Ag in the strict sense. It 
consists of blocks whose entries involve only basis func- 
tions (wavelets and scaling functions) of the same level, 
which may simplify their computation in comparison to 
the standard form. On the other hand, the nonstandard 
form does not support preconditioning when dealing with 
operators of order different from zero and is therefore 
restricted to problems of the type (70). Due to the pres- 
ence of scaling function coefficients, it also does not allow 
us to combine matrix compression together with function 
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compression. We shall point out later that this is indeed 
supported by the standard form, 


4.5 Numerical tests 


The following example has been provided by Harbrecht. 
An interior Dirichlet problem for the Laplacian is solved 
by the indirect approach. We use both, the Fredholm inte- 
gral equation of the first kind based on the single layer 
operator and the Fredholm integral equation of the second 
kind based on the double layer potential operator. Both 
approaches yield a density u, from which one derives the 
solution in the domain via potential evaluation, that is, by 
applying the single layer operator and double layer operator, 
respectively, to the density; see (68), (71). 

The domain Q under consideration is the gearwheel 
shown in Figure 8. It has 15 teeth and is represented using 
180 patches. As Dirichlet data, we choose the restriction of 
the harmonic function 


(a, x) 
Hx [3° 


U(x) = 


a = (1,2,4) 


to T. Then, U is the unique solution of the Dirichlet prob- 
lem. We discretize the given boundary integral equation by 
piecewise constant wavelets with three vanishhig moments. 
In order to measure the error produced by the method, 
we calculate the approximate solution U, = Au, at several 
points x; inside the domain, plotted in Figure 8. The last 
column in the tables below reflects the effect of an a 
posteriori compression applied to the computed entries of 
the stiffness matrix, The discrete potentials are denoted by 
U:= [UG@;)], U, := KAu a] 
where .A stands for the single or double layer operator. 
We list in Tables 1 and 2 the results produced by the 
wavelet Galerkin scheme. For the double-layer approach, 
the optimal order of convergence of the discrete potential 
is quadratic with respect to /°-norm over all points x;. 


Table 1. Numerical results with respect to the double layer oper- 
ator. 


J N; |U—Uylloo cpu-time a priori (%) a posteriori (%) 
1 720 48e—-1 1 27 79 
2 2880 2.7e—1 (1.8) 10 8.7 2.3 
3 11520 7.6e—2 (3.6) 107 3.4 0.6 
4 46080 2.4e—2 (3.1) 839 1.0 0.2 
5 184320 6.0e—3 (4.1) 4490 0.2 0.0 


Table 2. Numerical results with respect to tbe single layer oper- 
ator, 


J Ni IU — U;lio cpu-time a priori (%) a posteriori (%) 
1 720 49e—1 1 28 21 

2 2880 5.7e—2 (8.6) 12 10 74 

3 11520 1.2e-2 (4.5) 116 42 2.0 

4 46080 2.8e—3 (4.5) 1067 13 0.5 

5 184320 1.0e—3 (2.9) 6414 0.4 0.1 


For the single-layer approach, this order is cubic. But one 
should mention that one cannot expect the full orders of 
convergence, due to the reentrant edges resulting from the 
teeth of the gearwheel. 


4.6 Concluding remarks 


The above results show how to realize, for any (a priori 
fixed) level J of resolution, a numerical scheme that solves 
a boundary integral equation with discretization error accu- 
racy in linear time. As for the quantitative performance, the 
above examples indicate that accuracy is not degraded at 
all by the compression and quadrature errors. Moreover, the 
robustness with respect to the underlying geometry is sur- 
prisingly high. The experiences gained in Harbrecht (2001) 
show that the number of basis functions on the coarsest 
level may go up to the square root of the overall problem 
size without spoiling the complexity significantly. Due to 
the built-in preconditioning, the actual iterative solution of 


the linear systems is still by far dominated by the efforts 
for computing A,. ‘ 

The concept is strictly based on a perturbation of the 
operator but makes no use of adaptivity with respect to the 
discretization. First tests in this direction have been made in 
Harbrecht (2002). However, this is somewhat incompatible 
with the basic structnre where all computations are tuned 
to an a priori fixed highest level J of spatial resolution. 
Finite subsets of the wavelet basis serve to formulate a 
Galerkin discretization in the same way as classical set- 
tings so that no direct use is made of the full wavelet 
transformed representation of the boundary integral equa- 
tion. 

Thus, incorporating adaptivity may require an alterna- 
tive to the entrywise computation of A j» which we shall 
comment on later. 

There are other ways of accelerating the calculation 
of matrix/vector products in the above context such as 
panel clustering (Hackbusch and Nowak, 1989), multipole 
expansions (Greengard and Rokhlin, 1987), or hierarchical 
matrices (Hackbusch, 1999). These concepts offer an even 
better robustness with respect to the geometry since they 
exploit the smoothness of the integral kernel in R“ and not 
of its trace on the manifold. However, these approaches 
do not allow one to build in preconditioning in such a 
straightforward way as above and adaptivity is even harder 
to incorporate. A combination of the different concepts has 
recently been proposed in Schmidlin, Lage and Schwab 
(2002), combining the advantages of clustering and wavelet 
techniques. 


5 A NEW ADAPTIVE PARADIGM 


So far, we have sketched essentially two different direc- 
tions where the key features of wavelets listed in Section 
2:4 played an essential role. In the context of boundary 
integral equations, it was established that corresponding 
operators possess well-conditioned sparse wavelet represen- 
tations. When dealing with hyperbolic conservation laws, 
the typical piecewise smooth nature of solutions permits 
the compression of the flow field based on suitable thresh- 
olding strategies applied to multiscale representations of 
approximate solutions. In both cases, an arbitrary but fixed 
level of resolution was considered and wavelet concepts 
were used to precondition or accelerate the numerical 
processing of the resulting fixed finite dimensional prob- 
lem. 

We shall now turn to recent developments that deviate 
from this line and aim at combining in a certain sense both 
effects, namely the sparse representation of functions and 
the sparse representation of (linear and nonlinear) operators. 
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The subsequent developments are based on the results in 
Cohen, Dahmen and DeVore (2001, 2002a,b,c), and Dahlke, 
Dahmen and Urban (2002). 


5.1 Road map 


Recall that the classical approach is to utilize a variational 
formulation of a differential or integral equation mainly as 
a starting point for the formulation of (Petrov—) Galerkin 
scheme, which gives rise to a finite dimensional system 
of linear or nonlinear equations. The finite dimensional 
problem has then to be solved in an efficient way. As we 
have seen before, one then faces several obstructions such 
as ill-conditioning or the instability of the discretizations, 
for instance, due to the wrong choice of trial spaces. 
For instance, in the case of the double layer potential 
operator, stability of the Galerkin scheme is only guaranteed 
for sufficiently good spatial resolution, that is, sufficient 
closeness to the infinite dimensional problem. As we shall 
see below, more severe stability problems are encountered 
when dealing with noncoercive problems such as saddle 
point problems. In this case, the trial spaces for the different 
solution components have to satisfy certain compatibility 
conditions known as the LadyShenskaja-Babu8ka-Brezzi 
(LBB) condition, In brief, although the underlying infinite 
dimensional problem may be well-posed in the sense of 
(40), the corresponding finite dimensional problem may not 
always share this property. 

In contrast, we propose here a somewhat different 
paradigm that tries to exploit the well-posedness of the 
underlying continuous problem to the best possible extent 
along the following line: 


(D Establish well-posedness of the underlying varia- 

tional problem; 

(I) transform this problem into an equivalent infinite 
dimensional one which is now well-posed in £,; 

(ID devise a convergent iteration for the infinite dimen- 
sional £,-problem; 

(IV) only at that stage, realize this iteration approxi- 
mately with the aid of an adaptive application of 
the involved (linear or nonlinear) operators. 


5.2 The scope of problems — (1) 


We describe first the scope of problems we have in mind. 
We begin with a general format which will then be exem- 
plified by several examples. 

For a given (possibly nonlinear) operator F, the equation 


Fu) = f (89) 
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is always understood in a weak sense, namely to find u in 
some normed space H such that for given data f 


(v, Flu)) = (v, f), WEH (90) 


This makes sense for any f € H’, the dual of H (recall 
(41)) and when F takes H onto its dual H’. In principle, the 
conservation laws fit into this framework as well. They will 
be, however, excluded from major parts of the following 
discussion, since we will assume from now on that H is a 
Hilbert space. 

The operator F is often given in a strong form so that 
the first task is to identify the right space H for which 
(90) is well-posed. We have already seen what this means 
when F is linear; see (40). The classical Dirichlet problem 
with H = HQ) and the single layer potential equation 
(67) with H = H -1/2(T) are examples. When dealing with 
nonlinear problems, one may have to be content with 
locally unique solutions and it is natural to require that 
corresponding local linearizations are used to define well- 
posedness. Thus we assume that the Frechét—derivative 
DF v) of F at v, defined by 


(z, DF(v)w) = lim oe F(u+tw)—Flv)), YzEH 
(91) 
exists for every v in a neighborhood U of a solution u of 
(90) as a mapping from H onto H’. In analogy to (40), 
well-posedness now means that there exist for every v € U 
positive finite constants Cz, Cz, such that 


cg yllwiy < IDFO)wihy < Cr yllully, Yw eH (92) 


We have already seen several examples in Sections 
2.5 and 4.1 which, however, are all coercive. We shall, 
therefore, briefly review several further examples as models 
for different problem types. They also indicate that, as 
an additional practical obstruction, the topology for which 
the problem is well-posed involves norms that are usually 
difficult to deal with computationally. Most of the examples 
are actually linear but they may as well play the role of a 
(local) linearization. 


5.2.1 Transmission problem 


The following example is interesting because it involves 
both local and global operators (see Costabel and Stephan, 
1990) 


-V.(aVu) = f in Q, 
—Au=0 in Q,, 
ulp, = 90 


H i= Hp, (Qo) x AT) 


Both boundary value problems are coupled by the inter- 
face conditions: 


uo = ut, (8 uT = (0,)ut 


A well-posed weak formulation of this problem with respect 
to the above H is 


(aVu, Vujo, + (Wu - (4T - K'}o, vhr, = (fva 
v € Hår, (2%), 
(GZ—K)u, 8p, + (Yo, 8)p, =, 


è e HAT) 


where K, V, W are the double, single layer potential, and 
hypersingular operator (see Dahmen, Kunoth and Schnei- 
der, 2002). 

Note that, as an additional obstruction, the occurrence 
and evaluation of difficult norms like ||- llam» 
VER aon lle | H-O) arises (see Chapter 12, Chap- 
ter 13 of this Volume). 


5.2.2 Saddle point problems 


Al the above examples involve coercive bilinear forms. An 
important class of problems which are no longer coercive 
are saddle point problems. A classical example is 


The Stokes System The simplest model for viscous 
incompressible fluid flow is the Stokes system 
—vAu+Vp=f in Q 
divu=0 in Q 
ulp =0 (93) 


where u and p are velocity and pressure respectively (see 
Brezzi and Fortin, 1991; Girault and Raviart, 1986). The 
relevant function spaces are 


X= (HEQ), M= L, o9) 
= {a € L: f q= o) (94) 
QR 


In fact, one can show that the range of the divergence 
operator is Ly (2). The weak formulation of (93) is 


WV, Vujo + (div v, P = (fv), v e (Hg (2))* 
(div u, ara =), 46E Ly 9(Q) (95) 


that is, one seeks a solution (u, p) in the energy space 
H = X x M = (HE (Q) x Ly 9(Q), for which the map- 
ping property (92) can be shown to hold. 


First Order Systems One is often more interested in 
derivatives of the solution of an elliptic boundary value 
problem which leads to mixed formulations. Introducing the 
fluxes 0 := —a Vu, (36) can be written as a system of first 
order equations whose weak formulation reads 


(8, n) + (n,aVu) =0, Yn E (L,(2))%, 
—(0, Vv) = (f, v), Vue Hår, (2) (96) 


One now looks for a solution (8,4) € H := (LQ x 
Hir, (Q). For a detailed discussion in the finite element 
context, see, for example, Bramble, Lazarov and Pasciak 
(1997). It turns out that in this case that the Galerkin 
discretization inberits the stability from the original second 
order problem. 


The General Format The above examples are special 
cases of the following general problem class. A detailed 
treatment can be found in Brezzi and Fortin (1991) and 
Girault and Raviart (1986). Suppose X, M are Hilbert 
spaces and that a(., +), b(-, -) are bilinear forms on X x X, 
X x M respectively, which are continuous 


lav, wW) $ iolixlwlx lbg, o Slelixiglan OD 


Given f, € X’, fı € M', find (u, p) E€ X x M =: H such 
that one has for all (v, g) EH 


_ [alu v) + dp, v) = (w, fi) 
((v, q), Flu, p)) := ee: = (q, fr) (98) 


Note that when a(.,-) is positive definite symmetric, the 
solution component u minimizes the quadratic functional 
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J(w) := (1/2)a(w, w) — (fı, w) subject to the constraint 
b(u, q) = (q, fa), for all g € M, which corresponds to 


inf sup ($a(v, v) +b, q) — (fi V) — @, fa) 
veX qeM 


This accounts for the term saddle point problem (even under 
more general assumptions on a(., -)). 

In order to write (98) as an operator equation, define the 
operators A, B by 


a(v,w) =: (v, Aw), v,weX, 
bv, p) =: (Bv,q)}, p.geM 


NO- 


As for the mapping property (92), a simple (sufficient) 
condition reads as follows (see Brezzi and Fortin, 1991; 
Girault and Raviart, 1986). If a(-, -) is elliptic on 


so that (98) becomes 


w > 


Flu, p) = ( 


ker B := {v € X:b(v,q) =0, Yq E M} 
> 


that is, 
a(v, v) ~ lulz, v ekerB (100) 


and if b(-, +) satisfies the inf-sup condition 


r b(v, q) 
96M vex |lvilxllgily 


>ß (101) 


for some positive B, then (97) is well posed in the sense of 
(92). Condition (101) means that B is surjective (and thus 
has closed range). Condition (100) is actually too strong. It 
can be replaced by requiring bijectivity of A on ker B (see 
Brezzi and Fortin, 1991). 

Aside from leading to large ill-conditioned systems the 
additional obstructions are the indefiniteness of this type of 
problem and that the well-posedness of the infinite dimen- 
sional problem is not automatically inherited by Galerkin 
discretizations, say. In fact, the trial spaces in X and M 
have to be compatible in the sense that they satisfy the 
inf-sup condition (101) uniformly with respect to the res- 
olution of the chosen discretizations. This is called the 
LadyShenskaya-Babuska-Brezzi condition (LBB) and may, 
depending on the problem, be a delicate task (see Chap- 
ter 9, this Volume). 


5.2.3 A nonlinear model problem 


A wide range of phenomena involve the interaction of a 
(linear) diffusion with a nonlinear reaction or advection 
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part. We therefore close the list of examples with the simple 
class of semilinear elliptic boundary value problems, On 
one hand, it permits a rather complete analysis. On the other 
hand, it still exhibits essential features that are relevant for 
a wider scope of nonlinear problems. In this section, we 
follow Cohen, Dahmen and DeVore (2002b) and suppose 
that a(-, -) is a continuous bilinear form on a Hilbert space 
H endowed with the norm ||- ll}, which is H-elliptic, that 
is, there exist positive constants c, C such that 


2 
cllullz, sa, v), atv, w) < Cllullllwi,, Yv, w eH 


(102) 
The simplest example is 


a(v,u) := (Vv, Vu) +K(v, 0), k20, (v, w) = [ via 
QR 


103 
and H= H}(Q) endowed with the norm Be > 
IVvlæ + klivo: 

Suppose that G: R —> R is a function with the following 
property: 


Pi the mapping v > G(v) takes 1 into its dual H’ and is 
stable in the sense that 


NG@) — GO) Ing < Cmax {lula Holle — vila, 
u,veH (104) 


6 
where s —> C(s) is a nondecreasing function of s. 


The problem: Given f € H’ find u € H such that 
(v, FU)) := av, u) + (v, Gu)) = w, f) VuveH 

(105) 
is of the form (90) with F(u) = Au + G(u). Note that with 
the bilinear form from (103), (105) may also arise through 
an implicit time discretization of a nonlinear parabolic 
equation. 

The unique solvability of (105) is easily ensured when 
G is in addition assumed to be monotone, that is, (x — 
VAGE) — G(y)) = 0 for x, ye R. In this case, F(u) = f 
is the Euler equation of a convex minimization problem 
(Cohen, Dahmen and DeVore, 2002b). 

A simple example is 


(v, Flu)) = Í Vv” Vu + vu? dx, H = HE (Q), 
H’ = HQ) (106) 


That (at least for d < 3) H = Hi (Q) is indeed the right 
choice can be seen as follows. The fact that H!(Q) is 
continuously embedded in L4(Q) for d = 1,2,3, read- 
ily implies that G(v) € H~'() for v € H} (2). Moreover, 


(z, FW + tw) — Flv)) = t(Vz, Vw) + (z, 1302w +223 00? 
oe w?) so that (z, DF(v)w) = (Vz, Vw) + 3(z, v?w) and 
ence 


DF(v)w = —Aw + 3v’w (107) 


Therefore, again by the embedding HS > L, if (1/2) < 
(s/d) + (1/p), that is, lull, S Illy, for d < 4 with H = 
Hj(Q), we see that 


2 
IDFowlhy = sup EI t3 vw) 
zeH lizllz 


S lw lle + lvl lly 


On the other hand, 


(Vw, Vw) +3(w, vw) _ Vw, 
lwla = iwla 
= (1+ (9AT wily 


IDF@) wll, = 


where we have used Poincaré’s inequality in the last step. 
Hence we obtain 


(1+ 6(Q)?) wily < DFO we S A+ lulz Iwihy, 
wen (108) 


which is (92). 

The scope of problems is actually not limited at all 
to the variational formulation of integral or partial dif- 
ferential equations but covers, for instance, also optimal 
control problems with PFEs as constraints (see Dahmen 
and Kunoth, 2002; Kunoth, 2001). 


5.3 Transformation into a well-posed 
£2-problem — (ID) 


Suppose that (90) is well-posed with respect to the energy 
space H. In all previous examples, H was a (closed sub- 
space of a) Sobolev or a product of such spaces. As 
indicated in Section 2, it is known how to construct wavelet 
bases for such spaces. In the following, we will therefore 
assume that WV is a Riesz basis for H. 

The transformation of (90) into wavelet coordinates is 
analogous to the linear case (see Theorem 1 in Section 
2.5). In fact, testing in (90) with v=, for all Xe J, 
defines through F(v) := ((,, FOU x6 gè Sequence valued 
nonlinear mapping F, which depends on the array v € 
£, via the wavelet expansion v = }>,.7v,%b,. Similarly, 
the Jacobian of F at v acting on w € £, is defined by 
DF (v)w = ((b,, DFw)w)),<7. Finally, setting as before 


i 
t 
| 
i 
i 
i 


f:= (U J)h ep the following fact can be derived by the 
same arguments as in- Theorem 1. 


Theorem 6. Assume that (92) and (30) hold. Then the 
variational problem (90) is equivalent to E(u) =f where 
u = Dye74t¥y. Moreover, when the latter problem is well 
posed in £, that is, for v = YiyezWy, in some neighbor- 
hood U of the locally unique solution u of (90) 


cycx lw, < IDFO)wile, < CECr Wey WEL 
(109) 


As for the special case (105), note that the monotonicity 
of G implies the monotonicity of G(-), defined by G(v) := 
(i, G@)))ye,7 and hence the positive semidefiniteness 
of DG(v); see Cohen, Dahmen and DeVore (2002b) for 
details. 


5.4 An iteration for the infinite dimensional 
problem — (III) 


Once the problem attains a well-conditioned form in £,, 
it makes sense to devise an iterative scheme for the full 
infinite dimensional problem F(u) = f that converges with 
a guaranteed error-reduction rate. These iterations will take 
the form 

wt = u’ — C, Eu") — f), 2=0,1,2,... (110) 
where the (infinite) matrix C, is possibly stage dependent. It 
can be viewed as a fixed point iteration based on the trivial 
identity u = u —C(E(u) — f). We shall indicate several 
-ways of choosing C,, depending on the nature of the 
underlying variational problem (90). 


Gradient Iterations: In the above case of elliptic semi- 
linear problems, the transformed problem still identifies the 
unique minimum of a convex functional. Thus, it makes 
sense to consider gradient iterations, that is, C,, = al 
ut! = yt —a(F(u”) —f), 2=0,1,2,... (111) 
In fact, one can estimate a suitable positive damping param- 
eter a > 0 from the constants in (30), (92), and (104) (see 
Cohen, Dahmen and DeVore, 2002b for details), so that 
(111) converges for u? say with a fixed reduction rate p < 1, 


ju’? — ulg, < plu” — ule, 2 €Ny (112) 
For instance, in the linear case G = 0, one can take any 


a < 2/(C2,C,) (see (43)) and verify that p = max {1 ~ 
ach ca |1 — aC3,C 4|} works. 
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Least Squares Iteration: Of course, when dealing 
with indefinite Jacobians, the above scheme will in gen- 
eral no longer work. However, the well-posedness in 2, 
offers, in principle, always a remedy which we explain first 
in the linear case Au = f, where this may stand for any 
of the well-posed problems discussed in Section 5.2. Since 
then (109) reduces to (43), one can again find a positive a 
so that C, = aA? leads to an iteration 

wt} = u” — aA (Au — f), 2=0,1,2,... (113) 
that satisfies (112) with some fixed p < 1. Clearly, this is 
simply a gradient iteration for the least squares formulation 
ATA = Af in the wavelet coordinate domain, see also 
Dahmen, Kunoth and Schneider (2002) for connections 
with least squares finite element methods. 

This is interesting because it suggests an analog also in 
the general nonlinear case even when DF(v) is indefinite 
but (92) (or equivalently (109)) holds. In fact, the role of 
AT is played by DF(u")’. Setting 


R(v) := F(v) -f (114) 


and noting that 


1 

R(v) = F(v) — Fu) = (J DF(u + s(v — w) 4s) (v-un) 
; (115) 

one can derive from (109) that the iteration 


ut! = u" — aDE(u")? R") (116) 


can be made to satisfy (112) for a suitable positive damping 
factor a, depending on the constants in (109) and a suffi- 
ciently good initial guess u? (which is always needed in 
the case of only locally unique solutions). Moreover, when 
being sufficiently close to the solution, C, = aDF(u")? 
can be frozen to C= aDF(u°)’ so as to still realize a 
strict error reduction (112); see Cohen, Dahmen and DeVore 
(2002b) for details. 


Uzawa Iteration: Of course, in the above least squares 
iterations (113) and (116) the damping factor a may have to 
be chosen rather small, which entails a poor error-reduction 
rate p. Whenever A or the linearization DF(v) corresponds 
to a saddle point operator (98) or (99), the squaring of the 
condition number caused by the least squares formulation 
can be avoided with the aid of an Uzawa iteration (see 
Dahlke, Dahmen and Urban, 2002; Dahmen, Urban and 
Vorloeper, 2002). We indicate this only for the linear case. 
Instead of working directly with the wavelet representation 
F(u, p) = f, one can first eliminate the solution component 
u, the velocity in the case of the Stokes problem, on the 
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infinite dimensional operator level. Recall from (99) that F 
takes the form 


p f 
ron=(5 a tela 
f := Kx Fher’ i := CUm Aa regy 11D) 


where for given Riesz bases Vy, Yj, of the component 
spaces of H = X x M, A:=a(Wy, Vy), B := by, Vy) 
are the wavelet representations of the operators A, B in 
(99), respectively. Recall also that A need only be invertible 
on the kernel of B. To eliminate u, one may first have to 
modify the system as explained in Dahlke, Dahmen and 
Urban (2002) so as to make the modification of A an 
automorphism on all of £,(7,). Without loss of generality 
we may therefore assume that this is already the case and 
that A satisfies (43). Then from the first system Au + 
B’ p = f,, we conclude that u = A7'f, — A~'B”p which, 
by the second equation in (117), gives Bu = BA-'f, — 
BA~'B'p = f. Hence (117) is equivalent to the Schur 
complement problem 


Sp =BA7'f, -f£,, S:=BA7'B? (118) 


Once (118) has been solved, the eliminated component can 
be recovered from (the elliptic problem) 


Au=f, —B"p (119) 


From the well-posedness of (117) on 2, one easily derives 
that S is also boundedly invertible on £,(7%,,). So one can 
find a positive œ such that the gradient iteration p’+! = 
p” — a(Sp” — (BA™'£, —f,)) satisfies (112) for some p < 
1. (Actually, the residual may have to be modified again 
when, as in the case of the Stokes problem, the Lagrange 
multiplier space M is a subspace of finite codimension in a 
larger space for which a wavelet Riesz basis is given. For 
simplicity, we suppress this issue here and refer to Dahlke, 
Dahmen and Urban (2002) for details. The problem is that 
the Schur complement is generally not easily accessible, 
due to the presence of the factor A-!. However, note 
that Sp" — (BA7'£, — f) = f, — Bu” whenever u” is the 
solution of the elliptic problem Au” =f, — B’p". The 
gradient iteration for the Schur complement problem then 
takes the form 


p"t! = p” — aff, — Bu”), where Au” =f, — Bp", 
n=0,1,2,... (120) 


Thus, the iteration is again of the form we want, but each 
step requires as an input the solution of a subproblem. 


Newton Iteration: Finally, the choice C, := 


n 


(DF(u"))~! in (110) gives rise to the Newton scheme 


waa" +w’, DF(u")w =f— Fu") = —R(u") 
(121) 
where each step requires the solution of a linear subprob- 
lem. While all previous examples have convergence order 
one, the Newton scheme, in principle, offers even better 
convergence behavior. We shall address this issue later in 
more detail, 


5.5 Perturbed iteration schemes — (IV) 


We have indicated several ways of forming an (idealized) 
iteration scheme on the full infinite dimensional trans- 
formed system F(u) =f. We have made essential use of 
the fact that the transformed system is well posed in 2, in 
the sense of (109). Recall that this hinges on the mapping 
property (92) induced by the original continuous problem 
and the availability of a Riesz basis for the corre- 
sponding energy space H, (30). The final step is to realize 
the idealized iteration numerically. We shall do so not by 
choosing some fixed finite dimensional space on which the 
problem is projected, as was done in previous sections, 
but rather by approximating at each step the true residual 
r” := C, Œu") — f) = C, R(u") within a suitable dynami- 
cally updated accuracy tolerance. Again we wish to avoid 
choosing for this approximation any a priori, fixed, finite 
dimensional space but try to realize the required accuracy 
at the expense of possibly few degrees of freedom. The 
whole task can then be split into two steps: s 


(a) Assuming that, in each case at hand, a computational 
scheme is available that allows us to approximate the 
residuals within each desired target accuracy, deter- 
mine first for which dynamic tolerances the itera- 
tion will converge in the sense that for any given 
target accuracy €, the perturbed iteration outputs a 
finitely supported vector u(e) such that ||u — u(e)|le, 
<e. Thus on the numerical side, all approximations 
will take place in the Euclidean metric. Note however 
that because of (30), this implies that 


u- D> Ohh <Cye 22 


esupp u(e) H 


(b) Once this has been clarified, one has to come up with 
concrete realizations of the residual approximations 
appearing in the idealized iteration. It is clear from the 
above examples that the crucial task is to approximate 
with possibly few terms the sequence F(u”) in £3. 
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Moreover, we will have to make precise what we mean 
by ‘possibly few terms’, that is, we have to analyze the 
computational complexity of the numerical schemes. 


We shall address first (a) under the assumption that we 
are given a numerical scheme 


Res [n, C, F, f, v] > T}: WHICH FOR ANY POSITIVE TOLERANCE 
N AND ANY FINITELY SUPPORTED INPUT V OUTPUTS A FINITELY 
SUPPORTED VECTOR I’, SATISFYING 


|CRW) —rylle, S79 (123) 
The second ingredient is a routine 


Coarse [n, w] —> Wp: WHICH FOR ANY POSITIVE TOLERANCE 
N AND ANY FINITELY SUPPORTED INPUT VECTOR W PRODUCES A 
FINITELY SUPPORTED OUTPUT W, WITH POSSIBLY FEW ENTRIES (SUB- 
JECT TO CONSTRAINTS THAT WILL BE SPECIFIED LATER) SUCH THAT 


lw -= Walle < (124), 


This routine will be essential later to control the com- 
plexity of the overall perturbed iteration scheme. 

Next we need some initialization, that is, an initial guess 
u? and an error bound 


lu — lly, < €o (125) 


In general, such an estimate depends on the problem at 
hand. In the case of semi-linear elliptic problems, such 
a bound is, for instance, €p = cg cg AGO e, + lifle,) 
for u? = 0; see Cohen, Dahmen and DeVore (2002b) for 
details. 

As a final prerequisite, one can find, again as a conse- 
quence of (109), for each of the above iteration schemes, 
(110) a constant B such that 


lu — vile, = ICR) lle, (126) 


holds in a neighborhood of the solution u. The perturbed 
iteration scheme may now be formulated as follows. 


Sorve [e, C, R, u] > ü(€) 


() CHoose some P € (0, 1). ser 0° = u’, THE CORRESPOND- 
ING INITIAL BOUND €p ACCORDING TO THE ABOVE INITIALIZA- 
TION, AND j = 0; 

(n) Ir éj < € STOP AND OUTPUT ti(€) := Ü; ELSE SET vo = 
i anpk=0 


(11) SET ny i= pře; AND COMPUTE 


r? = Resin, C, F, f, vt], vt! = vě — r“ 


(u2) IF 


Bine +l le (127) 


mae 
21 + 20%) 
SET ¥ := v* AND GO To am). ELsEseTk+1—>k 
AND GO TO (11.1). 
(m) COARSE [@c*e,)/20. +2C*)), > at, G41 = 
¢,/2, j+1—> j, GOTO ® 


The constant C* depends on the particular realization of 
the routine Coarse and will be specified later. 

Note that step is just an approximation of the updates 
in (110). This applies until the stopping criterion (127) is 
met. This is a posteriori information based on the numer- 
ical residual. The fact that there is actually a uniform 
bound K for the number of updates in step ap, indepen- 
dent of the data and the target accuracy, until (127) is 
satisfied and a coarsening step am is carried out, relies 
on (126) and the underlying well-posedness (109). The 
parameter p is actually allowed here to be smaller than 
the true error- reduction rate p in (112) for which only 
a poor or a possibly too pessimistic estimate may be 
available. 

One can show by fairly straightforward perturbation 
arguments that the choice of accuracy tolerances in SOLVE 
implies convergence (Cohen, Dahmen and DeVore, 2002b). 


Proposition 1. The iterates Ūū produced by the scheme 
Solve satisfy 


Jju-W Il, < 6; (128) 


so that in particular |u — ù (e)l, < €. By (30), this means 


u- $ ät 


EAC) 


< Cyc (129) 


H 


where Cy is the constant from (30) and A(€) := supp u(e). 


6 CONSTRUCTION OF RESIDUAL 
APPROXIMATIONS AND 
COMPLEXITY ANALYSIS 


It remains to construct concrete realizations of the routines 
Res and Coarse. It turns out that the development of 
such routines is closely intertwined with their complexity 
analysis. Since the conceptual tools are probably unfamiliar 
in the context of numerical simulation, we highlight some 
of them in the next section. 


i 
i 
i 
i 
| 
| 
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6.1 Best N-term approximation 


A lower bound for the computational complexity of SoLve 
is, of course, the growth of the supports of outputs in 
step a11), which determines how the overall number of 
degrees of freedom grows until the target accuracy is 
reached. Therefore, a lower bound for the computational 
complexity of Sove is given by the number of terms needed 
to recover the true solution u in £, within accuracy e, This is 
the issue of best N-term approximation in £,. Thus the ques- 
tion arises whether, or under which circumstances, SOLVE 
can actually attain this lower bound, at least asymptoti- 
cally. Since best N-term approximation limits what can be 
achieved at best, we briefly review some relevant features 
concerning best N-term approximation. 

There are two intimately connected ways of looking at an 
error analysis for N-term approximation. In the first, we can 
specify the target accuracy € and ask what is the smallest 
number N(e) of terms needed to recover a given object? 
The second view is to assume we are given a budget N of 
terms and ask what accuracy ¢(N) can be achieved with the 
best selection of N terms? The process of selecting such 
N terms is obviously nonlinear. This can be formalized by 
defining the following error of approximation: 

Oy gY) = PT lv — wile, (130) 
Obviously, oy z, (V) is attained by w =v, comprised of 
the N largest terms of v in modulus. Note that this is 
not necessarily unique since several terms may have equal 
modulus. Analogously one can define oy (v) by looking 
for the best approximation of v in H by a linear combination 
of at most N wavelets. One easily infers from (30) that 


CyOy a (Y) S On. nv) < Cytne,(v) (131) 


Best N-term approximations in the Euclidean metric yield 
therefore near- best N-term approximations in H. Hence, 
an element in H can be approximated well with relatively 
few terms if and only if this is true for its coefficient array 
in £,. 

We shall proceed with identifying classes of sequences 
in £, for which «(N) decays like N~*, since these are the 
rates that can be expected from approximations based on 
spatial refinements (#-methods). To this end, consider the 
classes 


A (H) == [v E H: oy y(v) SN} 
AS (Ey) = {v € Ly: oy e,(V) SN} B) 


These are normed linear spaces endowed with the norms 


lull sag = sup Noy ulo) IVAs) = SUP N° oy g (v) 
NEN NEN 


Thus to achieve a target accuracy €, the order of N (e) ~ 
e71/s terms are needed for v € A' (H) or v € A®(£,). Hence 
the larger s > 0 the less terms suffice. 

Which property makes a function v or its coefficient 
sequence v sparse in the above sense is best explained when 
H is a Sobolev space H = H° over some domain in R°. 
One can then show that for any positive è (cf. DeVore, 
1998; Bergh and Löfström, 1976; Cohen, 2000) 


£, CAE) C laa, BESEACH) 


a gs (133) 
q 2 


a pors i), 
Here, £, consists of the q-summable sequences, while 
By(L,) denotes a Besov space consisting roughly of func- 
tions with smoothness q measured in Li (see DeVore, 
1998; DeVore and Lorentz, 1993; Bergh and Löfström, 
1976; Cohen, 2003; Barinka, Dahlke and Dahmen, 2003; 
Barinka et al, 2001; Berger and Oliger, 1984 for precise 
definitions). For a certain smoothness range, depending 
on the regularity of the wavelet basis these spaces can 
also be characterized through wavelet bases. In fact, for 
0 <q < œ one has 


lolggap ~ olg, +A Me, FE, 03 
REJ 


which, due to the equivalence H* = B35 (L), covers (32) as 
a special case (see e.g. Cohen, 2000, 2003; DeVore, 1998). 
It is important to note here that the smaller the q, the weaker 
is the smoothness measure. By the Sobolev embedding the- 
orem, the value of g given by (133) gives the weakest 
possible measure so that smoothness of order sd +a in 
L, guarantees Sobolev regularity of order œ correspond- 
ing to the anchor space H = H® (a Sobolev space of order 
a or a closed subspace defined, for example, by homoge- 
neous boundary conditions). This is illustrated in Figure 
9 below. Each point in the (1/q,s)-plane corresponds to 
a smoothness space (actually to a class of smoothness 
spaces) describing smoothness s measured in L,. In our 
case, we have X = H® and p = 2. The spaces located left 
of the line with slope d emanating from X are embedded 
in X. The spaces of smoothness a+sd on the vertical 
line above X are essentially those whose elements can 
be approximated with accuracy O(N~*) by approximants 
from quasi-uniform meshes, that is, with equidistributed 
degrees of freedom. In the present terms, this means just 
keeping all wavelets up to some scale J, say (or equiv- 
alently working with uniform meshes), so that N ~ 274 
would require the function v to belong to BSt*7(L.), which 
is very close to the Sobolev space H‘+S?. The spaces on 
the critical embedding line, however, are characterized by 
nonlinear approximation like best N-term approximation. 


A 
i 
f 
j; 


a 


c spaces Fnbedding ° (Slope d) 
4 “ No embedding 
af in X 
“ 
s+ty o(n-ty O(n) 
Nonlinear 
Si X: measurement of the error 
(s derivatives in LP) 
< ; > 
tip 1/q=1/p+ tid LP spaces 


Figure 9. Topography of smoothness spaces. 


Thus, while the spaces obtained when moving to the right, 
away from the vertical line on the same smoothness level, 
grow and admit increasingly stronger singularities. This loss 
of regularity can be compensated by judiciously placing the 
degrees of freedom so as to retain the same convergence 
rates in terms of degrees of freedom N. Since H't*4 is a 
much smaller space than g (L m this indicates the pos- 
sible gain offered by nonlinear approximation schemes like 
best N-term approximation over simpler schemes based on 
a priori fixed discretizations. 

Of course, it remains to see whether this potential can be 
exploited by adaptive schemes. 


Tree Structures: The above notion of best N-term 
approximation puts no constraints on the distribution of sig- 


- nificant coefficients, In the context of conservation laws, it 


was important that the significant coefficients were arranged 
in tree-like structures which correspond to local mesh 
refinements. Thus interrelating the selection of wavelets 
with locally refined meshes is one reason for imposing some 
sort of constraint on the distribution of wavelet coefficients. 
Another reason arises when approximating the quantities 
F(v) when v is some finitely supported vector. Intuitively, 
one might expect that the nonlinearity in F makes the effect 
of a term v, with large |X| cascade down to lower levels 
in a neighborhood of the support S,, which also gives rise 
to tree-like structures. 

Let us first explain what we mean by a tree structure 
associated to the set of wavelet indices. In the simplest case 
of a one dimensional basis Wy, = Vj, = 2/y(2/ -—k), 
this structure is obvious: each index (j, k) has two children 
(j +1,2k) and (j + 1,2k + 1). A similar tree structure can 
be associated to all available constructions of wavelet bases 
on a multidimensional domain: to each index ^, one can be 
assign m(d) > 2 children p such that |u| = |X| + 1, where 
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mÈ) might vary from one index to another but is uniformly 
bounded by some fixed K. We shall use the notation y < 2 
(u <2) in order to express that w is a descendent of ^ (or 
equals 2) in the tree. We also have the property 


p<r+=SS, 05, (135) 


where we recall that S, := supp y,. A set TC Jis called 
a tree if \ € Timplies y € 7 whenever d ~< p. 

If the tree TC J is finite, we define the set £ = L(T) 
of outer leaves as the set of those indices outside the tree 
whose parent belongs to the tree 


LxfhEeT:r€¢T, Aku == > LETN} (336) 


The set £(T) plays the role of a (locally refined) mesh. In 
fact, one readily confirms that 


w- = m=: > ma D 


AET RELC(T) vivah 


which suggests considering the quantities 


a= YP (138) 


VNC S 


These quantities measure in some sense a local error asso- 
ciated with the spatial location of 4. To see this, suppose 
that the wavelets have the form Y} = w9, where w, are 
some positive weights (see (33)) and © is a Riesz basis for 
L, (which is the case for all constructions considered in 
Section 2). Then, by (30) 


Dy YW, NE ve, 


i IMPA = 


KET H net L 
so that 
v- lawl ~nh (139) 
XET H AET La 


Note that the right hand side can be localized. In fact, for 
uw € £(T) 


2 
X AGN = X vð, 


MT jasy eters. #0 la 


2 


c 35 2. 3 OS 
[vI=|E[: SNS, AB LELIT), HOS, AB 
(140) 
It has been shown in Cohen, Dahmen and DeVore (2002c) 
that any tree J can be expanded to a tree T such that 
HT) <#(Z) but for any p € L(T) only for a uniformly 
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bounded finite number of à € L one has $, N S, # Ø. 
Hence a finite number of the terms #, bound the local error 
on $,- 

A natural idea for constructing ‘good’ meshes — or equiv- 
alently ‘good trees’ identifying spans of wavelets — is to 
equilibrate these local errors. However, it turns out that this 
will not necessarily minimize the error ||v — Vir ljg, over all 
trees of a fixed cardinality N = #(T) (see Cohen, Dahmen 
and DeVore, 2002c). To formalize this, we define an error 
for N-term tree approximation that is the exact tree analog 
of the best (unconstrained) N-term approximation defined 
in (130). 


oN (Y) = min (Ilv — wile, 


T:= supp w is a tree and #7< N} (141) 


Any minimizing tree will be denoted by Z,(v). We define 
now in analogy to (131) the sequence space 


All) = fv E £9: 0%, (v) SN} (142) 
endowed with the quasi-norm 


vilas (en = SUB N* ove, (¥) (143) 
neN 


Analogously, we can define the counterpart Aes (H) 
in H. As in (131) the error of tree approximation of 
v € Afe (l2) decays like the error of the corresponding tree 
approximations to v in H. 

In spite of the conceptual similarity, there is an important 
difference between best tree and best unconstrained N- 
term approximation. At least for any finitely supported 
v, the latter one is easily determined by (quasi-) sorting 
by size thresholding. Determining the best tree however, 
is much harder. However, since one obtains a near-best 
approximation in the energy norm anyway, we can be 
content with near-best tree approximation in £, as well. 
More precisely, given a fixed constant C* > 1, a tree 
Tn, v) is called an (m, C*)-near best tree for v if liv — 
Viole 9 and whenever any other tree T satisfies 
lv — vizlle, < n/C* one has #(Z(n, v)) < C*#(Z). It is 
remarkable that, according to Binev and DeVore (2002), 
such near-best trees can be constructed in linear time. This 
can be achieved with the aid of modified thresholding 
strategies working in the present setting with the quantities 
3 (138) as local error functionals. We shall invoke this 
method to construct near-best trees. 

Since the selections of terms are constrained by the 
imposed tree structure, one always has 


Tmo) V) < IV = Yro le (144) 


However, for a wide class of functions in H, one actu- 
ally does not lose too much with regard to an optimal 
work/accuracy rate. To explain this, we consider again the 
above scenario H = H’. The following fact has been shown 
in Cohen, Dahmen and DeVore (2002c). 


Remark 4. For H = H' one has B'+#4(L,) > Akee (H) 
whenever q7! < s + 1/2. 


Thus, as soon as the smoothness space is strictly left of the 
Sobolev embedding line, its elements have errors of tree 
approximations that decay like (#Zy(v))™; see Figure 9. 
Moreover, this rate is known to be sharp, that is, 


sup inf N°oy, m (0) Z1 (145) 


Wall grasa gy5! 


which means that on the class Bid (La); under the above _ 


restriction of q, tree approximations give the same asymp- 
totic error decay as best N-term approximations. The 
smaller the discrepancy 8 := s + (1/2) — (1/4) > 0, the 
larger the space poA admitting stronger singulari- 
ties. In fact, when sup {s : u € H'+™4} is strictly smaller 
than sup {s : u € By” (Z,)} the asymptotic work/accuracy 
rate achieved by meshes corresponding to the trees Ty (-) is 
strictly better than that for uniform mesh refinements. This 
is known to be the case, for instance, for solutions u of ellip- 
tic boundary value problems on Lipschitz domains when è 
is sufficiently small (see Dahlke and DeVore, 1997; Dahlke, 
1999). 

Thus, the question that guides the subsequent discussion 
can be formulated as follows: Can one devise the routines 
Res and Coarse in such a way that the computational 
work and storage needed to produce the output u(e) of 
SoLVE, stays proportional to «~'/5, uniformly in €, whenever 
the unknown solution u belongs to At,,.(H), or even to 
AS(H)? 


6.2 Realization of residual approximations 


We shall always assume that we have full access to the 
given data f. Depending on some target accuracy, one 
should therefore think of f as a finite array that approximates 
some ‘ideal’ data accurately enough. Moreover, these data 
are (quasi-)ordered by their modulus. Such a quasi-ordering, 
based on binary binning can be realized in linear time 
(see e.g. Barinka, 2003). In particular, this allows us to 
obtain coarser approximations f,, satisfying IE flea <7 
with the aid of the simplest version of the routine COARSE, 
realized by adding |f,|* in the direction of increasing 
size until the sum exceeds ns see Cohen, Dahmen and 
DeVore (2001) for details. As a central task, one further 


has to approximate the sequence F(v) for any given finitely 
supported input v that we shall now describe. 


Linear Operators: It will be instructive to consider 
first the linear case F(v) = Av when A is the wavelet 
representation of the underlying operator. We shall describe 
an algorithm for the fast computation of Av. So far, the only 
property of A that we have used is the norm equivalence 
(30). Now the cancellation properties (26) come into play. 
We have seen in Section 4.2 that they imply the quasi- 
sparsity of a wide class of linear operators. The relevant 
notion can be formulated as follows (Cohen, Dahmen and 
DeVore, 2001). A matrix C is said to be s*-compressible — 
C eC» -— if for any 0 <s <s* and every j EN there 
exists a matrix C, with the following properties: For some 
summable sequence, (a) È; aj < 00) C; is obtained 
by replacing all but the order of u7 entries per row and 
column in C by zero and satisfies 


IC- Cjl < Caz", jeN (146) 


Specifically, wavelet representations of differential and also 
the singular integral operators from Sections 4.2 and 4.1 
fall into this category for values of s*, that depend, in 
particular, on the regularity of the wavelets (see Cohen, 
Dahmen and DeVore, 2001; Dahlke, Dahmen and Urban, 
2002; Stevenson, 2003). 

In order to describe the essence of an approximate appli- 
cation scheme for compressible matrices, we abbreviate for 
any finitely supported v the best 2/-term approximations by 
Yy = Yy -y = 0) and define 


Wy = Aj Yo + Apa W — Yop H + Aoi — Yy- 


(147) 
as an approximation to Av. Obviously this scheme is 
adaptive in that it exploits directly information on v. In 
fact, if A €C,., then the triangle inequality together with 
the above compression estimates yield for any fixed s < s* 


láv- willa 


i 
< e| lv = Vell, +90 %2" Ivy- — Yy--nlie 
—— 20 D oa k ge = a 
Oj 2, ) Sm j-1-1,. 


(148) 
One can now exploit the a posteriori information offered 
by the quantities 03;-1-1 g, (VY) to choose the smallest j for 
which the right hand side of (148) is smaller than a given 
target accuracy 1 and set w, ‘= Wj. Since the sum is finite 
for each finitely supported input v such a j does indeed 
exist. This leads to a concrete multiplication scheme (see 
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Cohen, Dahmen and DeVore, 2001; Barinka et al., 2001 for 
a detailed description, analysis and implementation), which 
we summarize as follows: 


Appty [n, A, v] > Wy: DETERMINES FOR ANY FINITELY SUPPOR- 
TED INPUT V A FINITELY SUPPORTED OUTPUT W, SUCH THAT 


Av — Walle, = (149) 


Depending on the compressibility range s* this scheme 
can be shown to exhibit the same work/accuracy rate as 
the best (unconstrained) N-term approximation in £, as 
stated by the following result (Cohen, Dahmen and DeVore, 
2001). 


Theorem 7. Suppose that A € C, and that for some 0 < 
s <s*, ve A' (l). Then, Av is also in A’(£,). Moreover, 
for any finitely supported v the output w, = Apety[n, C, v] 
satisfies: 


O Kwale S lll ae? 
(ii) #supp Ww, S viens, #flops < #snpp v + 
1 i 
Ilen Us, 
where the constants in these estimates depend only on s 
when s is small. 


The above work count is based on the tacit assumption 
that the entries of A can be computed with sufficient 
accuracy on average at. nnit cost. This can be verified 
for constant coefficient differential operators and spline 
wavelets. In general, the jnstification of such an assumption 
is less clear. We shall return to this point later. 


The Nonlinear Case — Prediction of Significant Coeffi- 
cients: In this case, the point of view changes some- 
what. The question to be addressed first is the following: 


Given any y > 0 and an (n, C*)-near best tree Tin, v) of 
v, find a possibly small tree T, such that for some constant 
Cc 


TCR Fv) S 7, (150) 


where T*(Cn, F(v)) is a smallest tree realizing accuracy 
Cn. 


Thus, we are asking for quantitative estimates concerning 
the effect of a nonlinearity on contributions with different 
length scales, a question of central importance in several 
areas of applications such as turbulence analysis. Using 
trees now already anticipates the need for taking the (quasi- 
local) effect of higher frequency on lower ones into account. 

Of course, tight estimates of that sort must incorporate 
some knowledge about the character of the nonlinearity. 
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Nonlinearities of at most power growth have been studied 
recently in Cohen, Dahmen and DeVore (2002c) and we 
briefly review some of the main findings. For instance, 
when the operator involves a local composition operator 
G as in (105) ‘power growth’ means that for some p > 0 
IGM)! S + xhem. In fact, one can show that for 
H = H' (on some domain of spatial dimension d) one has 
G:H — H’ provided that 


d+2t 
d—2t 


d 
pep = when t<>, p>O when mi 


(151) 
(see Cohen, Dahmen and DeVore, 2002c). The analysis in 
Cohen, Dahmen and DeVore (2002c) covers a much wider 
class of nonlinear operators including those that depend on 
several components involving also derivatives of the argu- 
ments G(D™ v, ..., D” w). For instance, the convective 
term in the Navier Stokes equations is covered, In order 
to convey the main ideas while keeping the exposition as 
simple as possible, we confine the subsequent discussion to 
the above special situation. Using the locality of the non- 
linearity, the cancellation properties of the wavelets as well 
as certain norm equivalences for Besov spaces in terms of 
weighted sequence norms for the wavelet coefficients, one 
can derive estimates of the form 


IF(v), | Ss PPa e ep (152) 


AAS, 


where for H = H' a typical value for y is y = t -+ m + 
d/2. It measures in some sense the compressibility of the 
nonlinear map. 

How to predict good trees for F(v) from those for v for 
the above- mentioned type of nonlinearities can be sketched 
as follows (cf. Cohen, Dahmen and DeVore, 2002c). For a 


given target accuracy €, j = 0,1,... and a given v, we 
consider the near-best trees 
Qe 
TRST] 
: a) i?) 


and the corresponding expanded trees T mentioned before. 
By construction, these trees are nested in the sense that 
T; C T_,. We shall use the difference sets 

A= G\ Gar (154) 


in order to build a tree, which will be adapted to w = F(v). 
They represent the ‘energy’ in v reflecting the next higher 
level of accuracy. Now we introduce the parameter 


ai= > 0 (155) 


where y is the constant in (152) and for each p € A; we 
define the influence set 


Agu = (MS, OS, AØ and [Al <|wl+aj} (156) 


Thus the amount aj by which the level || is exceeded 
in A, „ depends on the ‘strength’ of v, expressed by the 
fact that p € A;. We then define 7; as the union of these 
influence sets 


T= U (vet hen (157) 


The main result can then be stated as follows (Cohen, 
Dahmen and DeVore, 2002c). 


Theorem 8. Given any v and T, defined by (157), we have 
the error estimate 


IE) -Eviz le Se (158) 


Moreover, if V € Azee (t2) withO < s < [(2y — d)/2d], then 
we have the estimate 


ETO SUE ene" + Ty) (159) 
We therefore have F(v) € Ates (£2) and 
|F(v) lag ce) Sit lvla e (160) 


The constants in these above inequalities depend only on 
lvl], the space dimension d, and the parameter s. 


This result provides the basis for the following evaluation 
scheme. 


Evar [e, F, v] —> w(€) PRODUCES FOR ANY FINITELY SUPPORTED 
VECTOR V A FINITELY SUPPORTED VECTOR W(€) SUCH THAT || w(€) 
— F(¥)||,, < € USING THE FOLLOWING STEPS: 


(1) INVOKE THE ALGORITHM IN (BINEV AND DeVore, 2002) 
TO COMPUTE THE TREES 


T: z( as ) (161 
j= Py Es 

h Col +1) i 
WHERE C} = Co(||v]|) 18 THE CONSTANT INVOLVED IN (158), 
FOR j =0,..., J, AND STOP FOR THE SMALLEST J SUCH 


THAT T} IS EMPTY (WE ALWAYS HAVE J S log,((lv||/€)). 
(2) DERIVE THE EXPANDED TREES Tj THE LAYERS Â; AND THE 
OUTCOME TREE T, ACCORDING TO (157). 
(3) Compute F(v)|z (APPROXIMATELY WITHIN ACCURACY €). 


lree 


for every s > 0. Moreover, the trees 7; will be empty 


Clearly any finitely supported v belongs to A§,,(é,) 


for j > J and some J € N. Thus the scheme terminates 
after finitely many steps. We postpone some comments on 
step (3) to Section 6.4. The following theorem summarizes 
the properties of Algorithm Evat. 


Theorem 9. Given the inputs € > 0, a nonlinear function 
F (such that E satisfies assumptions of the type mentioned 
before), and a finitely supported vector y, then the out- 
put tree T, has the following properties: One has ||F(v) — 
E)r lz, <€ Furthermore, for any 0< s < [(2y — d)/ 
2d] (see Theorem 8), one has 


AT) < CIVET +I) = Ne (162) 


with C a constant depending only on the constants appear- 
ing in Theorem 8. Moreover, the number of computations 
needed to find T, is bounded by C(N, + #T(v)), where N, 
is the right hand side of (162) and Ky) is the smallest tree 
containing supp V. 


Finally, we need a particular coarsening strategy that 
respects tree structures. 


COARSE Ín, w] > Wy DETERMINES FOR A FIXED CONSTANT C* > 
1, ANY FINITELY SUPPORTED INPUT W, AND ANY TOLERANCE Y) > 0 
AN (n, C*)-NEAR BEST TREE F(N, W) AND SETS W = Wan.w): 


The realization of Coarse is based on the results in Binev 
and DeVore (2002), which ensure linear complexity. This 
version of Coarse can also be used in the linear case. 
As such, it can be used to show that Theorem 7 remains 
valid for compressible matrices when the spaces A*(£,) 
are replaced by Aj,,,(£,) (see Cohen, Dahmen and DeVore, 


_ 2002b). 


The above results allow one to show that the scheme Res, 
in all the above examples satisfies the following: 


Whenever the exact solution u of (90) belongs to Aies (H) 
for some s < s*, then one has for any finitely supported 
input v and any tolerance y > 0 that the output ry ‘= 
Resin, C, F, f, v] satisfies 


tee 1 
#suppr, < CaM (WH yy + IIL ay +1) 


a SE (vlee + holla e + 1) (163) 


where (in addition to the dependence given in the previous 
theorems) C depends only on s when s > s*. Moreover, 
the number of operations needed to compute W, stays 
Proportional to #supp¥,. 


One can show that the number of perturbed updates in 
step m) of SoLvE executed before branching off into a coars- 
ening step am, remains uniformly bounded independent of 
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the data and of the target accuracy e. Therefore, the s*- 
sparsity of the routine Res ensures that the Aee (£2)-norms 
of the approximations v* remain bounded in each update 
block m. The coarsening step is applied exactly in order 
to prevent the constants in these estimates from building 
up over several subsequent update blocks m. This is the 
consequence of the following Coarsening Lemma (Cohen, 
Dahmen and DeVore, 2002b). 


Proposition 2. if v € Afee(£2) and Iiv- Wille, <7 with 
#suppw < oo. Then Ww, == Coarse [2C*n, w] satisfies 
#supp W, S IVE gga”. Iv wll, SA + Cm 
and 
Ill az ce S lvla e 


where C* is the constant from the near-best tree construction 
scheme in Binev and DeVore (2002). 


One can then use the above building blocks to show that 
Sotve is optimal in the following sense (Cohen, Dahmen 
and DeVore, 2002b). 


Theorem 10. If the exact solution u = Jpeg Y be- 
longs to AŠ ee (H), for anys < s*(F, W), then, for any target 
accuracy € > 0, the approximations u(é) produced by SOLVE 
satisfy 


< Cye 
H 


u- SIO Vr, 
LS 


and 


#suppü(e), comp. work < e" 


The above results cover the examples from Section 5.2, 
in particular, the mixed formulations. Note that there is no 
restriction on the choice of wavelet bases for the different 
solution components such as velocity and pressure in the 
Stokes problem. In contrast, in the classical approach, a 
compatible choice verifying the LBB condition is essential. 
In the adaptive context such constraints become void. In 
addition to these qualitative asymptotic results, the experi- 
ments in Dahlke, Dahmen and Urban (2002) and Dahmen, 
Urban and Vorloeper (2002) show that also quantitatively 
the performance of the adaptive scheme stays the same even 
when wavelet bases for velocity and pressure are used that, 
in connection with an a priori choice of finite dimensional 
trial spaces, would violate the LBB condition. 
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6.3 The Newton scheme 


The scheme Sotve is based on an ideal iteration of order 
one. The choice C, := (DF(u"))~! offers in principle a 
better convergence behavior of the outer iteration. In fact, 
for problems of the type (105) one can show that the New- 
ton iteration (121) converges quadratically for a sufficiently 
good initial guess u. On the other hand, it is not clear what 
the cost of each linear subproblem 


DF (u")w" = —(F(u") — f) (164) 


will amount to. A detailed analysis is given in Cohen, 
Dahmen and DeVore (2002b) where it is shown that the 
perturbed Newton iteration still retains a quadratic conver- 
gence while preserving an overall asymptotically optimal 
complexity in the sense of Theorem 10. It is perhaps worth 
stressing the following two points. The well-posedness 
(109) ensures that the problem (164) is well-conditioned, 
which suggests employing Sotve for its approximate solu- 
tion. Nevertheless, this raises two questions. First, the 
scheme Appiy needed to realize the residual approxima- 
tion in Sotve would require assembling in some sense the 
matrix DF(u”) in each update step with sufficient accu- 
racy, which could be prohibitively expensive. However, 
the result of the application of DF(u") to a vector w can 
be interpreted as the array of dual wavelet coefficients 
of a nonlinear composition with two wavelet expansions, 
since 


DF(u")w = (p, DF (Uw) heg =: (Wy, QU”, whreg 


=: Q(u", w) 


The approximation of Q(u", w) can be realized again with 
the aid of the scheme Evar without assembling the Jaco- 
bian, which in this sense leads to a matrix free Newton 
scheme. In fact, in Cohen, Dahmen and DeVore (2002b) 
Evat is described for functions of several components. Fur- 
ther remarks on the related computational issues will be 
given in Section 6.4. 

Secondly, the complexity analysis is not completely 
straightforward because one cannot directly infer from the 
sparseness of u to the sparseness of the Newton systems 
(164). However, one can show that these systems may be 
viewed as perturbations of another system whose solution 
is sparse whenever u is sparse. This, however, limits the 
target accuracy for (164). Nevertheless, one can show that 
this still suffices to ensure second order convergence of 
the outer iteration (121) (see Cohen, Dahmen and DeVore, 
2002b). 


6.4 Computational issues 


The above complexity analysis works under the assumption 
that in the linear case the entries of A are computable at 
unit cost in order to invoke Appty. Likewise in the nonlinear 
case, the entries of F(v) in the predicted sets 7, have to be 
computed. Under fairly general assumptions, both tasks can 
be handled by the following strategy. By definition, one has 


FW) = YO, FO, = PEM) hy, 


EJ ET 


where W is the dual basis for W and hence a Riesz basis for 
H’. The idea is now to use an efficient recovery scheme, 
as described in Dahmen, Schneider and Xu (2000) and 
Barinka et al. (2003), that produces for a given target 
accuracy € an approximation g = }h en, at, E H’ to 
Flv) such that ||F(v) — gliw <€ at a computational cost” 
that stays proportional to the size of the prediction set 
T, from Eval. The norm equivalence now guarantees that 
the corresponding coefficient arrays exhibit essentially the 
same accuracy ||F(v) — gll: < cyle. The important point 
is that individual coefficients are never computed but, 
solely based on the knowledge of the prediction set T,, 
quadrature is always used to generate on the function side 
an approximation to the whole object F(v) by local quasi- 
interpolant techniques to be able to keep the computational 
work proportional to the number of current degrees of 
freedom. This strategy justifies the above assumptions in a 
wide range of cases (see Barinka, 2003; Dahmen, Schneider 
and Xu, 2000; Barinka et al., 2003 for details). 


6.5 Concluding remarks 


The primary goal of this section was to bring out the 
essential mechanisms and the potential of wavelet-based 
multiscale techniques and to understand under which cir- 
cumstances optimal complexity can be achieved. A crucial 
role was played by the mapping properties of the underly- 
ing variational problem in conjunction with the availability 
of a wavelet Riesz basis for the corresponding energy 
space. This also revealed where the principal difficulties 
may arise. Depending on the nature of H, or when dealing 
with complex geometries the construction of a good basis, 
W may be very difficult or even impossible. Poor constants 
in (30) would spoil the quantitative performance of SoLve 
significantly. Likewise poor constants in (92) would have 
the same effect. In fact, these constants may be parame- 
ter dependent and further work in this direction is under 
progress. But at least, the analysis reveals what one should 
be looking for in each concrete case which might certainly 


require much more additional work. More information on 
first computational studies for elliptic and indefinite prob- 
lems can be found in Barinka et al. (2001), Dahlke, Dahmen 
and Urban (2002), and Dahmen, Urban and Vorloeper 
(2002). 

The quantitative improvement of evaluation schemes like 
Eva. in conjunction with the strategies in Barinka (2003), 
Dahmen, Schneider and Xu (2000), and Barinka et al. 
(2003) certainly plays an important role. But already the 
pure prediction result in Theorem 8 at least gives rigorous 
bounds on the effect of certain nonlinearities concerning 
the interaction of fine and coarse scales — a problem that is 
at the heart of many multiscale phenomena in technology 
and science. 

Finally, the difficulties of finding stable pairs of trial 
functions in many mixed formulations may help one to 
appreciate the principal merits of techniques that inherit 
the stability properties of the original problem. In this con- 
text, the above multiscale techniques incorporate a natural 
stabilization effect. 
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NOTES 


[1] Throughout this chapter, we sometimes write A < B to 
indicate the existence of a constant c such that A < cB 
independent of any parameters on which A and B 
may depend. Moreover A ~ B means that A < B and 
BSA. 

There is at least one strategy for maintaining the 
Euclidean structure by employing fictitious domain 
techniques; appending, for instance, essential bound- 
ary conditions by Lagrange multipliers (Dahmen and 
Kunoth, 2001; Kunoth, 1995). 


[2 
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1 INTRODUCTION 
1.1 Structures 


Plates and shells are characterized by (i) their midsurface 
S, (ii) their thickness d. The plate or shell character is that 
d is small compared to the dimensions of S. In this respect, 
we qualify such structures as thin domains. In the case of 
plates, S is a domain of the plane, whereas in the case of 
shells, S is a surface embedded in the three-dimensional 
space. Of course, plates are shells with zero curvature. 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R, Hughes. Volume 1: Funda- 
mentals. © 2004 Jobn Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


Nevertheless, considering plates as a particular class of 
shells in not so obvious: They have always been treated 
separately, for the reason that plates are simpler. We think, 
and hopefully demonstrate in this chapter, that eventually, 
considering plates as shells sheds some light in the shell 
theory. 

Other classes of thin domains do exist, such as rods, 
where two dimensions are small compared to the third one. 
‘We will not address them and quote, for example, (Nazarov, 
1999; Irago and Viaño, 1999). Real engineering structures 
are often the union (or junction) of plates, rods, shells, and 
so on. See Ciarlet (1988, 1997) and also Kozlov, Maz’ya 
and Movchan (1999) and Agratov and Nazarov (2000). We 
restrict our analysis to an isolated plate or shell. We assume 
moreover that the midsurface $ is smooth, orientable, and 
has a smooth boundary 45. The shell character includes 
the fact that the principal curvatures have the same order 
of magnitude as the dimensions of S. See Anicic and Léger 
(1999) for a situation where a region with strong curvature 
(like 1/d) is considered. The opposite situation is when the 
curvatures have the order of d: We are then in the presence 
of shallow shells according to the terminology of Ciarlet 
and Paumier (1986). 


1.2 Domains and coordinates 


In connection with our references, it is easier for us to 
consider d as the half-thickness of the structure. We denote 
our plate or shell by 7. We keep the reference to the half- 
thickness in the notation because we are going to perform 
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an asymptotic analysis for which we embed our structure 
in a whole family of structures (Q*),, where the parameter 
e tends to 0. 

We denote the Cartesian coordinates of Rô by x = 
(x1, X2, X3), a tangential system of coordinates on S by 
Xr = (Xa)a=1,2» 2 normal coordinate to $ by x}, with the 
convention that the midsurface is parametrized by the 
equation x, = 0. In the case of plates (x,) are Cartesian 
coordinates in R? and the domain Qf has the tensor product 
form 


Qt = S x (-d, d) 


In the case of shelis, x; = (X,)y-1,2 denotes a local 
coordinate system on S, depending on the choice of a 
local chart in an atlas, and x, is the coordinate along 
a smooth unit normal field n to $ in RÌ. Such a nor- 
mal coordinate system (also called S-coordinate system) 
(Xq, X3) yields a smooth diffeomorphism between Q? and 
S x (—d,d). The lateral boundary T° of QÊ is char- 
acterized by x; € ðS and x, € (—d,d) in coordinates 
(Xp, X3). 


1.3 Displacement, strain, stress, and elastic 
energy 


The displacement of the structure (deformation from the 
stress-free configuration) is denoted by u, its Cartesian 
coordinates by (#4, U2, 43), and its surface and transverse 
parts by u- = (u,) and u, respectively. The transverse part 
U; is always an intrinsic function and the surface part Uy 
defines a two-dimensional 1-form field on S, depending on 
X,. The components (u,) of u- depend on the choice of the 
local coordinate system X+. 

We choose to work in the framework of small deforma- 
tions (see Ciarlet (1997, 2000) for more general nonlinear 
models e.g. the von Kármán model). Thus, we use the strain 
tensor (linearized from the Green—St Venant strain tensor) 
e=(e, y) given in Cartesian coordinates by 


1féu, ou, 
eé;;{u) = — ca pase 8 
y 2 (ze i zt) 
ri i 
Unless stated otherwise, we assume the simplest possible 
behavior for the material of our structure, that is, an 


isotropic material. Thus, the elasticity tensor A = (A¥#) 
takes the form 


AGH = ystist + p (BKB + silaiky 


with à and p the Lamé constants of the material and 
84 the Kronecker symbol. We use Einstein’s summation 


convention, and sum over double indices if they appear 
as subscripts and superscripts (which is nothing but the 
contraction of tensors), for example, o” e;; = X} ;_10 ¢;;. 
The constitutive equation is given by Hooke’s law o = 
Ae(u) linking the stress tensor ø to the strain tensor e(u). 
Thus 


oi! = (Eq + ezz + €33) + 2HE; i=1,2,3 a) 
of =2pe; for ij 
The elastic bilinear form on a domain Q is given by 


a(u, u’) = [ oa(u) : e(u') dx = l of (u) e; (u^) dx (2) 
Q Q 


and the elastic energy of a displacement u is (1/2)a(u, u). 
The strain—energy norm of u is denoted by [jul] d 


ani 
EO) 
defined as (X; fo le; (U)? dx). 


1.4 Families of problems 


We will address two types of problems on our thin domain 
9: (i) Find the displacement u solution to the equilib- 
rium equation divo (u) =f for a given load f, (ii) Find 
the (smallest) vibration eigen-modes (A, u) of the struc- 
ture. For simplicity of exposition, we assume in gen- 
eral that the structure is clamped (this condition is also 
called ‘condition of place’) along its lateral boundary T4 
and will comment on other choices for lateral bound- 
ary conditions. On the remaining part of the boundary 
824 \ T? (‘top’ and ‘bottom’) traction free condition is 
assumed. 

In order to investigate the influence of the thickness 
on the solutions and the discretization methods, we con- 
sider our (fixed physical) problem in 2? as part of a 
whole family of problems, depending on one parameter 
e € (0, £&], the thickness. The definition of Q* is obvi- 
ous by the formulae given in Section 1.2 (in fact, if 
the curvatures of S are ‘small’, we may decide that Qi 
fits better in a family of shallow shells, see Section 4.4 
later). For problem (i), we choose the same right hand 
side f for all values of e, which precisely means that 
we fix a smooth field f on Q® and take f° := flo: for 
each e. i 

Both problems (i) and (ii) can be set in variational form 
(principle of virtual work), Our three-dimensional varia- 
tional space is the subspace V(22°) of the Sobolev space 
H'(Q®)3 characterized by the clamping condition uļpe = 0, 
and the bilinear form a (2) on Q = °, denoted by aê. The 
variational formulations are 
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Find u® e V(Q®) such that 


a? (uf, u’) af f- u dx, Yu € V(Q°) 8) 
Qe 


for the problem with external load, and 
Find u° € V(Q*), u° 40, and A® ER such that 


a’(u’,u’) = af uè -u dx, Yu e V(Q*) (4) 
a 


for the eigen-mode problem. In engineering practice, one 
is interested in the natural frequencies, w* = vA Of 
course, when considering our structure 27, we are even- 
tually only interested in £ = d. Taking the whole family 
£ € (0, £] into account allows the investigation of the 
dependency with respect to the small parameter £, in order 
to know if valid simplified models are available and how 
they can be discretized by finite elements. 


1.5 Computational obstacles 


Our aim is to study the possible discretizations for a 
reliable and efficient computation of the solutions u? of 
problem (3) or (4) in our thin structure Qf. An option 
could be to consider Qf as a three-dimensional body and 
use 3-D finite elements. In the standard version of finite 
elements (h-version), individual elements should not be 
stretched or distorted, which implies that all dimensions 
should be bounded by d. Even so, several layers of elements 
through the thickness may be necessary. Moreover the 
a priori error estimates may suffer from the behavior of 
the Korn inequality on Qf (the factor appearing in the Korn 
inequality behaves like d7! for plates and partially clamped 
shells; see Ciarlet, Lods and Miara (1996) and Dauge and 
Faou (2004)). 

An ideal alternative would simply be to get rid of the 
thickness variable and compute the solution of an ‘equiva- 
lent’ problem on the midsurface S. This is the aim of the 
shell theory. Many investigations were undertaken around 
1960-1970, and the main achievement is (still) the Koiter 
model, which is a multidegree 3 x 3 elliptic system on S 
of half-orders (1, 1, 2) with a singular dependence in d. 
But, as written in Koiter and Simmonds (1973), ‘Shell the- 
ory attempts the impossible: to provide a two-dimensional 
representation of an intrinsically three-dimensional phe- 
nomenon’. Nevertheless, obtaining converging error esti- 
mates between the 3-D solution uf and a reconstructed 3-D 
displacement Uz? from the deformation pattern z? solution 
of the Koiter model seems possible. 

However, due to its fourth order part, the Koiter model 
cannot be discretized by standard C° finite elements. The 


Naghdi model, involving five unknowns on S, seems more 
suitable. Yet, endless difficulties arise in the form of various 
locking effects, due to the singularly perturbed character of 
the problem. 

With the twofold aim of improving the precision of 
the models and their approximability by finite elements, 
the idea of hierarchical models becomes natural: Roughly, 
it consists of an Ansatz of polynomial behavior in the 
thickness variable, with bounds on the degrees of the three 
components of the 3-D displacement. The introduction of 
such models in variational form is due to Vogelius and 
Babuška (1981c) and Szabó and Sahrmann (1988). Earlier 
beginnings in that direction can be found in Vekua (1955, 
1965). The hierarchy (increasing the transverse degrees) 
of models obtained in that way can be discretized by the 
p-version of finite elements. 


1.6 Plan of the chapter 


In order to assess the validity of hierarchical models, we 
will compare them with asymptotic expansions of solutions 
u® when they are available: These expansions exhibit two 
or three different scales and boundary layer regions, which 
can or cannot be properly described by hierarchical models. 

We first address plates because much more is known for 
plates than for general shells. In Section 2, we describe the 
two-scale expansion of the solutions of (3) and (4): This 
expansion contains (i) a regular part each term of which is 
polynomial in the thickness variable x;, (ii) a part mainly 
supported in a boundary layer around the lateral boundary 
T®. In Section 3, we introduce the hierarchical models 
as Galerkin projections on semidiscrete subspaces V9(Q*) 
of V(Q*) defined by assuming a polynomial behavior of 
degree q = (41, 42, 93) in Xz. The model of degree (i, 1, 0) 
is the Reissner—Mindlin model and needs the introduction 
of a reduced energy. The (1,1,2) model is the lowest 
degree model to use the same elastic energy (2) as the 3-D 
model. 

We address shells in Section 4 (asymptotic expansions 
and limiting models) and Section 5 (hierarchical models). 
After a short introduction of the metric and curvature ten- 
sors on the midsurface, we first describe the three-scale 
expansion of the solutions of (3) on clamped elliptic shells: 
Two of these scales can be captured by hierarchical models. 
We then present and comment on the famous classification 
of shells as flexural or membrane. We also mention two 
distinct notions of shallow shells. We emphasize the uni- 
versal role played by the Koiter model for the structure g7 
independently of any embedding of 9° in a family (©*),. 

The last section is devoted to the discretization of the 
3-D problems and their 2-D hierarchical projections, by p- 
version finite elements. The 3-D thin elements (one layer of 
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elements through the thickuess) constitute a bridge between 
3-D and 2-D discretizations. We address the issue of locking 
effects (shear and membrane locking) and the issue of 
capturing boundary layer terms. Increasing the degree p of 
approximation polynomials and using anisotropic meshes is 
a way toward solving these problems. We end this chapter 
by presenting a series of eigen-frequency computations on 
a few different families of shells and draw some ‘practical’ 
conclusions. 


2 MULTISCALE EXPANSIONS FOR 
PLATES 


The question of an asymptotic expansion for solutions u? 
of problems (3) or (4) posed in a family of plates is 
difficult: One may think it is natural to expand u’ either 
in polynomial functions in the thickness variable x}, or in 
an asymptotic series in powers e* with regular coefficients 
vt defined on the stretched plate Q = S x (—1,1). In 
fact, for the class of loads considered here or for the 
eigen-mode problem, both those Ansätze are relevant, but 
they are unable to provide a correct description of the 
behavior of u? in the vicinity of the lateral bonndary F*, 
where there is a boundary layer of width ~ e (except in 
the particular situation of a rectangular midsurface with 
symmetry lateral boundary conditions (hard simple support 
or sliding edge); see Paumier, 1990). And, worse, in the 
absence of knowledge of the boundary layer behavior, the 
determination of the terms v* is impossible (except for v°). 

The investigation of asymptotics as e > 0 was first per- 
formed by the construction of infinite formal expansions; 
see Friedrichs and Dressler (1961), Gol’ denveizer (1962), 
and Gregory and Wan (1984). The principle of multiscale 
asymptotic expansion is applied to thin domains in Maz’ ya, 
Nazarov and Plamenevskii (1991b). A two-term asymp- 
totics is exhibited in Nazarov and Zorin (1989). The whole 
asymptotic expansion is constructed in Dauge and Gruais 
(1996, 1998a) and Dauge, Gruais and Rossle (1999/00). 

The multiscale expansions that we propose differ from 
the matching method in Il’in (1992) where the solutions 
of singularly perturbed problems are fully described in 
rapid variables inside the boundary layer and slow variables 
outside the layer, both expansions being ‘matched’ in an 
intermediate region. Our approach is closer to that of Vishik 
and Lyusternik (1962) and Oleinik, Shamaev and Yosifian 
(1992) (see Chapter 2, Chapter 3 of Volume 2). 


2.1 Coordinates and symmetries 


The midsurface S is a smooth domain of the plane TI = R? 
(see Fig. 1) and for ¢ € (0, &) QF = S x (~e, £) is the 


Figure 2. Thin plate and stretched plate. 


generic member of the family of plates (see Fig. 2). The 
plates are symmetric with respect to the plane IT. Since they 
are assumed to be made of an isotropic material, problems 
(3) or (4) commute with the symmetry ©: u b> (uC, —%3), 
—u;(-, ~X3)). The eigenspaces of © are membrane and 
bending displacements (also called stretching and flexural 
displacements), cf. Friedrichs and Dressler (1961): 


u membrane iff uy (%, +%5) = U(X, —X3) 
and U3(X7, +%3) = —uU3(X7, —X3) 


u bending iff u(x, +X%3) = —Uy(%,, —X3) 
and u,(X_, +X3) = U(X, —Xs) 


(5) 


Any general displacement u is the sum Um +u, of a 
membrane and a bending part (according to formulae u,, = 
(1/2)(u+ Gu) and u; = (1/2)(u— Gu). They are also 
denoted by u? and u?” in the literature). 

In addition to the coordinates x, in S, let r be the distance 
to 3S in T and s an arclength function on ðS (see Fig. 1). 
In this way, (r, s) defines a smooth coordinate system in 
a midplane tubular neighborhood V of 4S. Let x = x() 
be a smooth cut-off function with support in V, equal to 1 
in a smaller such neighborhood. It is used to substantiate 
boundary layer terms. The two following stretched (or 
rapid) variables appear in our expansions: 


G 
vis 


The stretched thickness variable X, belongs to (—1, 1) 
and is present in all parts of our asymptotics, whereas 
the presence of R characterizes boundary layer terms (see 
Figure 2). 


2.2 Problem with external load a 


The solutions of the family of problems (3) have a two-scale 
asymptotic expansion in regular terms v and boundary 


layer terms w*, which we state as a theorem (Dauge, Gruais 
and Réssle, 1999/00; Dauge and Schwab, 2002). Note that 
in contrast with the most part of those references, we work 
here with natural displacements (i.e. unscaled), which is 
more realistic from the mechanical and computational point 
of view, and allows an easier comparison with shells. 


Theorem 1. (Dauge, Gruais and Réssle, 1999/00) For 
the solutions of problem (3), £ € (0, £o], there exist regular 
terms VF = vex, X,), k = —2, and boundary layer terms 
wÝ = wÝ (R, s, X3), k = 0, such that 


-2  ş-ly-i 


u? ev? p ely! + oy? + yw) 


+e + yw) +-- (6) 
in the sense of asymptotic expansions: The following esti- 
mates hold 


K 
u — > e (vě + ywky 


k=-2 


< Cg) eK +2, 
E(Q) 


K =0,1,... 


where we have set W™? = W7! = 0 and the constant Cx(f) 
is independent of £ € (0, £9]. ` 


2.2.1 Kirchhoff displacements and their deformation 
patterns 


The first terms in the expansion of uê are Kirchhoff dis- 
placements, that is, displacements of the form (with the 
surface gradient V = (0,, 8)) 


(Xr, X3) PH > V(K_, X3) = (ttr) — X, V7o3(X_), 


t3 (xr)) (7) 


Here, ¢, = (čą) is a surface displacement and ¢3 is a 
function on S. We call the three-component field g := 
($+,3), the deformation pattern of the KL displacement 
v. Note that 


v bending iff § = (0,¢,) and v membrane iff § = (gy, 0) 


In expansion (6) the first terms are Kirchhoff displacements. 
The next regular terms v* are also generated by deformation 
patterns ¢* via higher degree formulae than in (7). We suc- 
cessively describe the v*, the ¢* and, finally, the boundary 
layer terms w*. 


2.2.2 The four first regular terms 


For the regular terms v*, k = —2, —1, 0, 1, there exist bend- 
ing deformation patterns ¿7? = (0,¢37), g7! = (0,65), 
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and full deformation patterns ¢°, ¢! such that 


v = 0, ¢57) 
vl = (~X5 V7", 1) 
vo = CF —XVit51, t3) + 0, PPXA’) 
vi = (CL — XV 708, 63) + (PSO) Vp Arts”, 
PAG) div ef + P20G)A,t37) 
(8) 
In the above formulae, V- = (0), 0) is the surface gra- 
dient on S, A; = 4? + 93 is the surface Laplacian and 
div ¢_ is the surface divergence (i.e. div. = 0,5, + )b2). 
The functions Pf and P£ are polynomials of degree £, 


whose coefficients depend on the Lamé constants according 
to 


A 
Pas) = “Fay e 
A 1 
POG) = 5 + 4p, (< pii 3) 
1 
PX) = aia (GA + 4p) X3 — (LIA + 12) X3)- 


(9) 
Note that the first blocks in X xz €*v“ yield Kirchhoff 
displacements, whereas the second blocks have zero mean 
values through the thickness for each x; € S. 


2.2.3 All regular terms with the help of formal series 


We see from (8) that the formulae describing the successive 
vt are partly self-similar and, also, that each v* is enriched 
by a new term. That is why the whole regular term series 
>, e*v* can be efficiently described with the help of the 
formal series product. 

A formal series is an infinite sequence (a, a},..., a*, 
...) of coefficients, which can be denoted in a symbolic 
way by afe] = 59 "a", and the product a[e]b[e] of the 
two formal series a[e] and bfe] is the formal series c[e] with 
coefficients c° = Y o<r<g a*b'*. In other words, the equa- 
tion c[e] = a[e]b[e] is equivalent to the series of equation 
c= Posket akye—* ve. 

With this formalism, we have the following identity, 
which extends formulae (8): 


vie] = Vieléle] + Qlelfis} (10) 


(i) d[e] is the formal series of Kirchhoff deformation 
patterns )>,,_, e*¢* starting with k = —2. 

(ii) V[e] has operator valued coefficients vt k>0, acting 
from C (S)? into C°(Q)3: 
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VE = Er, t3) 
Vig = (X, Vrta, PLX) div g+) 
Vg = (P(X) Vr divy, P20G)A+o3) 


Wig =(P (Xj) Vp At dives, Phd (Xs) AFL) 
WH? = Re Meats 
m (X,)A dive) 
a1) 
with PÉ and PÉ polynomials of degree £ (the first 
ones are given in (9)). 
(iii) ffe] is the Taylor series of f around the surface x, = 0: 


fle] = > ‘e*t* with F(x, X3) = a at (x+) 

y ,X,) = 

k20 if k! ax§ ji 
(12) 


(iv) Qf{e] has operator valued coefficients QÝ acting from 
C(2)? into itself. It starts at k = 2 (we can see 
now that the four first equations given by equality 
(10) are w? = V8¢72, v! = Pet Vig, yo = 
VP! p vee V22, v! = eel Vig? 4 vend 
+ V°¢-?, which gives back (8)) 


x3=0 


ale] = >> kak (13) 


k22 


Each Q* is made of compositions of partial derivatives in 
the surface variables x- with integral operators in the scaled 
transverse variable. Each of them acts in a particular way 
between semipolynomial spaces E4(Q), q > 0, in the scaled 
domain Q: We define for any integer q, q = 0 


E%(Q) = f E C°(2)°, 32" e C” (6P, vax, X3) 


q 
= Exron) (14) 


n=0 


Note that by (12), f* belongs to E* (9). 

Besides, for any k > 2, Qt acts from E%(Q) into 
E%+*(Q), The first term of the series Qfe}f[e] is Q’f° and 
we have: 


1~ 3X3 
a°f (x, X) = (0 eee 
aX) = (0) Ea (x7) 
As a consequence of formula (10), combined with the 
structure of each term, we find 


Lemma 1. (Dauge and Schwab, 2002) With the definition 
(14) for the semipolynomial space E4(Q), for any k > —2 
the regular term v* belongs to E*+?(Q), 


2.2.4 Deformation patterns 


From formula (8) extended by (10) we obtain explicit 
expressions for the regular parts vt provided we know the 
deformation patterns ¢*. The latter solves boundary value 
problems on the midsurface S$. Our multiscale expansion 
approach gives back the well-known equations of plates 
(the Kirchhoff—Love model and the plane stress model) 
completed by a whole series of boundary value problems. 


(i) The first bending generator t7 2 solves the Kirch- 
hoff—Love model 


Lotz a) = Bp), xp eS with t57|,,=0, 


anb3 las =0 (45) 


where L, is the fourth-order operator 


1 
L, := — aged 
b Tkr OT 3 


(K+2p) 44 16) 
and n the unit interior normal to 3S. Here Ñ is the 
‘averaged’ Lamé constant 


PANTI 
n+ 2p 


X= 


(17) 


(ii) The second bending generator ty} is the solution of 
a similar problem 


L,ty' (7) =0, x eS with 3's =0, 
ano" las = & Arts” (18) 


where c} „ is a positive constant depending on the 
Lamé coefficients. 

(iii) The membrane part eo of the third deformation 
pattern solves the plane stress model 


Lt (X_) = F900), x ES and ¿Ẹlas =0 
where L,, is the second-order 2 x 2 system 
či ) ; 
(i 
x +28) Hua + pð 
A +u) wô + O + 2p)3 


x ea (20) 

bo 
(iv) Here, again, the whole series of equations over the 
series of deformation patterns )>,,_.e°¢* can be 


Birra A 
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written in a global way using the formal series prod- 
. uct, as reduced equations on the midsurface: 


Liel¢(e] = RleH[e] in S with d[e]¢[e]=0 on 3S 
(21) 
Here, L[e] = L? + e7L? + e4L* +--+, with 


Op _ Lm 0 or 
v= ( 0 o}\t 
L 0\fs ) 
L?g= ( m TA aas 22 
: ( 0 a) i e2 
where L2 ty bas the form cVqAy div ¢q. The series 
of operators R[e] starts at k=0 and acts from 


Ch (RY into C9 (5). Its first coefficient is the mean- 
value operator 


f 1 
fre Rf with RK) = 5 f F(x, X3) dX; 


Finally, the coefficients of the operator series d[e] are trace 
operators acting on ¢. The first terms are 


tren ~chu dive, 
dr = a n , dea k ; 
0 0 
. e 
d’; = s , P= S (24) 
ants Ry Arts 


where c? „ is the constant in (18), cf',, is another positive 
constant and e indicates the presence of higher order 
operators on f,. 

Note that the first three equations in (21): L°¢-? = 
0, L°r-! = 0, Lg? + L2¢-? = R°F® on S and dg? = 
0, de~! + dig! = 0, d°¢° + d'g7! +. d7? = 0 on ôS, 
give back (15), (18), and (19) together with the fact that 
ty = ty! = 0. 


2.2.5 Boundary layer terms 


The terms w* have a quite different structure. Their nat- 
ural variables are (R,8,X;), see Section 2.1 and Fig. 3, 


+1 
$ o i é 
as = Z 


Figure 3. Boundary layer coordinates in 05 x Ey. 


and they are easier to describe in boundary fitted compo- 
nents (W, Ws, W3) corresponding to the local coordinates 
(r, 8, X3). The first boundary layer term, w? is a bending 
displacement in the sense of (5) and has a tensor product 
form: In boundary fitted components it reads 


wi=0 and (w°, w9)(R, s, X3) = (8) w°(R, X3) 


with g= A, ty" las 


and w? is a two component exponentially decreasing profile 
on the semi-strip E} := {(R, X3), R > 0, [X3| < 1}: There 
exists ņ > 0 such that 


je"? W(R, X,)| is bounded as R- o0 


The least upper bound of such y is the smallest exponent 
No arising from the Papkovich—Fadle eigenfunctions; see 
Gregory and Wan (1984). Both components of we are 
nonzero. 

The next boundary layer terms w* are combinations of 
products of (smooth) traces on 3S by profiles w*? in 
(R, X,). These profiles have singularities at the comers 
(0, +1) of Z,, according to the general theory of Kon- 
drat’ev (1967). Thus, in contrast with the ‘regular’ terms 
vt, which are smooth up to the boundary of &, the terms 
w* do have singular parts along the edges ðS x {+1} of 
the plate. Finally, the edge singularities of the solution uë 
of problem (3) are related with the boundary layer terms 
only; see Dauge and Gruais (1998a) for further details. 


2.3 Properties of the displacement expansion 
outside the boundary layer 


Let S’ be a subset of S such that the distance between ôS” 
and 95 is positive. As a consequence of expansion (6) there 
holds 


K 
u(x) = PS ef vk (xp, X4) + O+) 


k=-2 


uniformly for x € S’ x (—e, £) 


Coming back to physical variables (x+, X3), the expan- 
sion terms v* being polynomials of degree k +2 in X; 
(Lemma 1), we find that 


K 
u(x) = > OK (x, x5) + OET) 
k=—2 


uniformly for x € S’ x (—e, £) 
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with fields 0 being polynomials in x, of degree K — k. 
This means that the expansion (6) can also be seen as a 
Taylor expansion al the midsurface, provided we are at a 
fixed positive distance from the lateral boundary. 

Let us write the first terms in the expansions of the 
bending and membrane parts uş and ur, of u°: 


2 
2 - ra X: 4 
uf = e7? (vrs ae ace hi") 


x Gapel -1 
(0 OF ia T ) bet (—x Vriz", 


x3 z 
n + aa oT ') +o (25) 


From this formula, we can deduce the following asymp- 
totics for the strain and stress components 


gg (Ub) = —£”°X59yg(t5~ + e83) + OO) 

hx. x i 
enue Arlt? te) +O) 26) 
o” (uz) = Ole) 


Since e~2x, = O(e7!), we see that e3, = O(e7!). Thus, o° 
is two orders of magnitude less than ¢,,, which means a 
plane stress limit. To compute the shear strain (or stress), 
we use one further term in the asymptotics of uf and obtain 
that it is one order of magnitude less than e33: 


2h + 2p 


Sm EÈ — 1)8,A763° + O2) 27) 


£03 (us) z 


Computations for the’ 
yield similar results 


AX a 
ut, = G -— dive?) 


embrane part Um are simpler and 


Xx. 4 
+e(sh, - Pe diver) + 


1 
eap Uy) = 5 Cath + 2468) + 5 (Bath + AD + OC) 


elun) = ean div (6% + e¢1) + OC) 

(28) 
and o (u$) = O(e"), eUn) = Ole). 

In (26)-(28) the O©(e) and O(e) are uniform on any 
region Sx (—e, £) where the boundary layer terms have 
no influence. We postpone global energy estimates to the 
next section. 


2.4 Eigen-mode problem 


For each e > 0, the spectrum of problem (4) is discrete and 
positive. Let A’, j = 1,2,... be the increasing sequence 
of eigenvalues. In Ciarlet and Kesavan (1981) it is proved 
that e2,A® converges to the jth eigenvalue Ap of the 
Dirichlet problem for the Kirchhoff operator L,, cf. (16). In 
Nazarov and Zorin (1989) and Nazarov (1991c), a two-term 
asymptotics is constructed for the e~?.A§. Nazarov (2000b) 
proves that e~2A$ — AÑ] is bounded by an O(,/e) for a 
much more general material matrix A. 

In Dauge et al. (1999), full asymptotic expansions for 
eigenvalues and eigenvectors are proved: For each j there 
exist 


e bending generators t;”, ¢;7, ... where ¢3” is an eigen- 
vector of Lẹ associated with Aj’, 

e real numbers At AP, “ee 

e eigenvectors uf , associated with A; for any €€ 
(0, £0) 


so that for any K > 0 
Aj = e Ag + AL; ++ e H Ab + Ok +3) 
u$ j = (Hy VEG BH) + (X Verbs ta) 


poet ek (VÝ + xw”) + Oe) (29) 


where the terms v* and w* are generated by the tk, k > 0 
in a similar way as in Section 2.2, and O(e¥+!) is uniform 
over Q’. 

The bending and membrane displacements are the eigen- 
vectors of the symmetry operator ©; see (5), Since -G 
commutes with the elasticity operator, both have a joint 
spectrum, which means that there exists a basis of common 
eigenvectors. In other words, each elasticity eigenvalue can 
be identified as a bending or a membrane eigenvalue. The 
expansion (29) is the expansion of bending eigen-pairs. 

The expansion of membrane eigen-pairs can be done in 
a similar way. Let us denote by Aj, ; the jth membrane 
eigenvalue on Qê and by Aft, the jth eigenvalue of the 
plane stress operator Lm, cf. (20) with Dirichlet boundary 
conditions. Then we have a similar statement as above, with 
the distinctive feature that the membrane eigenvalues tend 
to those of the plane stress model: 


1 K+1 
AR j = Amy +e Any teete Ag; FOE) GO) 


This fact, compared with (29), explains why the smallest 
eigenvalues are bending. Note that the eigenvalue formal 
series A[e] satisfy reduced equations L[e}gfe] = A[elé[e] 
like (21) with the same L°, L! = 0 and L? as in (22): In 
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particular, equations 


T an) E) 

=A{ot 31 
( 0 PL} \% t3 Fa 
give back the ‘limiting’ eigenvalues AK- and e? Af‘. Our 
last remark is that the second terms Ab, jy and AL j are 


positive; see Dauge and Yosibash (2002) for a discussion 
of that fact. 


2.5 Extensions 


2.5.1 Traction on the free parts of the boundary 


Instead of a volume load, or in addition to it, tractions g* 
can be imposed on the faces S x {+e} of the plate. Let us 
assume that g* is independent of e. Then the displacement 
uê has a similar expansion as in (6), with the following 
modifications: 


e If the bending part of g* is nonzero, then the regular 
part starts with e~?v~? and the boundary layer part 


e If the membrane part of g* is nonzero, the membrane 
regular part starts with e~!v-'. 


2.5.2 Lateral boundary conditions 


A similar analysis holds for each of the seven remaining 
types of ‘canonical’ boundary conditions: soft clamping, 
hard simple support, soft simple support, two types of fric- 
tion, sliding edge, and free boundary. See Dauge, Gruais 
and Réssle (1999/00) for details. It would also be possible 
to extend such an analysis to more intimately mixed bound- 
ary conditions where only moments through the thickness 
along the lateral boundary are imposed for displacement or 
traction components; see Schwab (1996). 

If, instead of volume load f or tractions g*, we set f = 0, 
g* = 0, and impose nonzero lateral boundary conditions, u° 
will have a similar expansion as in (6) with the remarkable 
feature that the degree of the regular part in the thickness 
variable is < 3; see Dauge and Schwab (2002), Rem. 5.4. 
Moreover, in the clamped situation, the expansion starts 
with O(1). 


2.5.3 Laminated composites 


If the material of the plate is homogeneous, but not 
isotropic, u® will still have a similar expansion, sce Dauge 
and Gruais (1996) and Dauge and Yosibash (2002) for 
orthotropic plates. If the plate is laminated, that is, formed 
by the union of several plies made of different homoge- 
neous materials, then uf still expands in regular parts v* and 


boundary layer parts w*, but the v are no more polynomials 
in the thickness variable, only piecewise polynomial in each 
ply, and continuous; see Actis, Szabo and Schwab (1999). 
Nazarov (2000a, 2000b) addresses more general material 
laws where the matrix A depends on the variables x, and 
X3 = X;/e. 


3 HIERARCHICAL MODELS FOR 
PLATES 


3.1 The concepts of hierarchical models 


The idea of hierarchical models is a natural and efficient 
extension to that of limiting models and dimension reduc- 
tion. In the finite element framework, it has been firstly 
formulated in Szabó and Sahrmann (1988) for isotropic 
domains, mathematically investigated in Babuška and Li 
(1991, 1992a, 1992b), and generalized to laminated com- 
posites in Babuška, Szabó and Actis (1992) and Actis, 
Szabo and Schwab (1999). A hierarchy of models consists 
of 


e a sequence of subspaces VI(Q°) of V(Q*) with the 
orders q = (q1, 92, 93) forming a sequence of integer 
triples, satisfying 


VQ) CVT) if asg (32) 


e a sequence of related Hooke laws o = Ae, cor- 
responding to a sequence of elastic bilinear forms 
a®4(u, u’) = fo Age(u) : e(u’). 


Let u®4 be the solution of the problem 
Find u®4 e V9(Q°) such that 


auediu) = f fu dx, Yu’ e V9(Q*) (33) 
œ 


Note that problem (33) is a Galerkin projection of problem 
(3) if a@4 = af. 

Any model that belongs to the hierarchical family has to 
satisfy three requirements; see Szabó and Babuška (1991), 
Chap. 14.5: 


(a) Approximability. At any fixed thickness £ > 0: 


lim |ju° — u*4|| 0 (34) 
q>% 


Eœ) 7 
(b) Asymptotic consistency. For any fixed degree q: 


lut = UPI econ 0 (35) 


lim 
0 Ul ean 
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(c) Optimality of the convergence rate. There exists a 
sequence’ of positive exponents y(q) with the growth 
property y(q) < y(q’) if q< q, such that ‘in the 
absence of boundary layers and edge singularities’: 


Iu = UY can SCI cea 8) 


zo Í 

The substantiation of hierarchical models for plates, in 
general, requires the choice of three sequences of finite 
dimensional nested director spaces Y} C. YF C. C 
H? (—1, 1) for j = 1,2,3 and the definition of the space 
V9(Q") for q = (41, q2: q3) as 


VIO) = [u € V, (ps Xa) H uj, €X3)) 
EHS) OW", j=1,2, 3) 67) 


We can reformulate (37) with the help of director functions: 
With d; (N ) being the dimension of wN , let oF = DIX), 
O<n< d(N ), be hierarchic bases (the director functions) 
of YN. There holds 


VUR) = {u E V(2), Iz? € HACS), O <n <d)q)), 


dj{aj) 


yar =} z) o (2)) 38) 


The choice of the best director functions is addressed in 
Vogelius and Babuška (1981c) in the case of second-order 
scalar problems with general coefficients (including possi- 
ble stratifications). For smooth coefficients, the space wr 
coincides with the space Py of polynomial with degree < 
N. The director functions can be chosen as the Legendre 
polynomials L,, (X3) or, simply, the monomials X; (and then 
xz can be used equivalently instead (x,/e)” in (38)). 

We describe in the sequel in more detail, the conve- 
nient hierarchies for plates and discuss the three qualities 
(34)-(36); see Babuška and Li (1991, 1992a) and Babuška, 
Szabó and Actis (1992) for early references. 


3.2 The limit model (Kirchhoff—Love) 


In view of expansion (6), we observe that if the transverse 
component f} of the load is nonzero on the midsurface, 
uê is unbonnded as ¢ > 0. If we multiply by &, we 
have a convergence to (0, 3 2), which is not kinematically 
relevant. At that level, a correct notion of limit uses scalings 
of coordinates: If we define the scaled displacement U* by 
its components on the stretched plate Q = S$ x (—1, 1) by 


Uh = eu and Us i= euh (39) 


then U° converges to (-X3V-t37, ery in HQ as 
€ —> 0. This result, together with the mathematical deriva- 
tion of the resultant equation (15), is due to Ciarlet and 
Destuynder (1979a). 

The corresponding subspace of V (N°) is that of bending 
Kirchhoff displacements or, more generally, of Kirchhoff 
displacements: 


VRQ’) = (ue V(Q'), 3¢ c Ad x Hi x B(S), 
u= (Er P X3 V+03, t3)} (40) 


It follows from (40) that e,, = €,; = 0 for which the phys- 
ical interpretation is that ‘normals to S prior to deforma- 
tion remain straight lines and normals after deformation’. 
Hooke’s law has to be modified with the help of what we 
call ‘the plane stress trick’. It is based on the assumption 
that the component o? of the stress is negligible (note 
that the asymptotics (6) of the three-dimensional solution 
yields that o” = O(e), whereas €4,, €33 = O(e') outside 
the boundary layer, cf. (26), which justifies the plane stress 
assumption). From standard Hooke’s law (1), we extract 
the relation o% = X(e,, + €32) + (A + 2pJes3, then set o% 
to zero, Which yields 


A 
ez = EE (e +22) (41) 


Then, we modify Hooke’s law (1) by substituting e,, by its 
expression (41) in o!! and o”, to obtain 


5 2p 
H = . = 
piar 5; Oy Cu +e) +2me,;,, i= 1,2 42) 


o =2Qpe, for ix j 
Thus, of = X(e,, + en) + 2ne,;, with À given by (17). 
Taking into account that e}, = 0 for the elements of 
v'(Q*), we obtain a new Hooke’s law given by the same 
formulae as (1) when replacing the Lamé coefficient i by 
A. This corresponds to a modified material matrix Aik 


Ault Xy (desi 4 sate (43) 


and a reduced elastic energy G(u, u) = fo oY (u)e;; (u). 
Note that for u = (+ — x;V-63, 63) 


(u, u) = 2e Í Abode (Leas Er) dxr 
26? f rapes 
+ SASAE Oey (44) 


exhibiting a membrane part in O(¢) and a bending part in 
O(e*). There hold as a consequence of Theorem 1 


Plates and Shells: Asymptotic Expansions and Hierarchic Models 209 


Theorem 2. Let u® be the solution of problem (33) with 
Vi VX and at =a. Then 


G) In general u = e-2(—x,Vr057, 3?) + O(1) with 
Er the solution of (15); 

Gi) If f is membrane, u% = (g9, 0) + OŒ) with tf the 
solution of (19). 


Can we deduce the asymptotic consistency for that 
model? No! Computing the lower-order terms in the expres- 
sion (35), we find with the help of (25) that, if 8 4 0 


HUH poe & OC”) 


Eœ) 
and 3 
lu — ue g 2 less (U zzo ~ Oe") 

Another source of difficulty is that, eventually, relation (41) 
is not satisfied by u". If f =0 and f% 40, we have 
exactly the same difficulties with the membrane part. 

A way to overcome these difficulties is to consider 
a complementing operator C defined on the elements of 


yt by 


iS 


cu=ut (0-5 


f avua) 8) 
0 


Then (41) is now satisfied by Cu for any u € V*'. More- 
over (still assuming f, # 0), one can show 


jue — Cur 


< Cellu'| 


Ea = (46) 


EQ) 
The error factor ,/e is due to the first boundary layer term 
w. The presence of w? is a direct consequence of the fact 
that Cu®** does not satisfy the lateral boundary conditions. 

Although the Kirchhoff—Love model is not a member 
of the hierarchical family, it is the limit of all models for 
e— 0. 


3.3 The Reissner—Mindlin model 


This model is obtained by enriching the space of kinemat- 
ically admissible displacements, allowing normals to S to 
rotate after deformation. Instead of (40), we set 


VPM (Oy = fu € V (2°), 3z e HG(S), 30, € ABS)’, 
u = (Zy — X87, Z5)} 


With the elasticity tensor A corresponding to 3-D elasticity, 
the displacements and strain—energy limit of the RM model 
as d — 0 would not coincide with the 3-D limit (or the 
Kirchhoff—Love limit). 


We have again to use instead the reduced elastic bilinear 
form @ to restore the convergence to the correct limit, by 
virtue of the same plane stress trick, The corresponding 
elasticity tensor is A (43). A further_correction can be 
introduced in the shear components of A to better represent 
the fully 3-D shear stresses o? and o?’ (and also the strain 
energy) for small yet nonzero thickness e. The material 
matrix entries A!3!3, A753 are changed by introducing the 
so-called shear correction factor K: 


F313 A133 JBB _. 4 42323 


By properly chosen K, either the energy of the RM solution, 
or the deflection u, can be optimized with respect to the 
fully 3-D plate. The smaller the £, the smaller the influence 
of « on the results. For the isotropic case, two possible 
K’s are (see details in Babuška, d'Harcourt and Schwab 
(1991a)): 


20 
Energy = 6 —v) Or KDefection = 38 —3v)’ 
with v= aw (Poisson ratio) 


A value of k =5/6 is frequently used in engineering 
practice, but for modal analysis, no optimal value of « is 
available. 

Note that, by integrating equations of (33) through the 
thickness, we find that problem (33) is equivalent to a 
variational problem for z and @ only. For the elastic energy, 
we have 


Alu, u) = 2e Í ABe p (Zq)eo (Zr) AX 
(membrane energy} 


+ ef KWL(0yZ5 — 94) (8q23 — 9) EX+ 
5 (47) 
(shear energy) 


28 Tapos 
+ ads AM ep (0--)eg5 Or) AX 
(bending energy) 


Let u®-"™ be the solution of problem (33) with V4 = VPM 
and a =@. The singular perturbation character appears 
clearly. In contrast with the Kirchhoff—-Love model, the 
solution admits a boundary layer part. Arnold and Falk 
(1990b, 1996) have described the two-scale asymptotics of 
u®®™_ Despite the presence of boundary layer terms, the 
question of knowing if u®®™ is closer to uê than u®* has 
no clear answer to our knowledge. A careful investigation 
of the first eigenvalues A‘, A‘, and A‘™ of these three 
models in the case of lateral Dirichlet conditions shows 
the following behavior for e small enough (Dauge and 
Yosibash, 2002): 
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ASTM = AS < AY 


which tends to prove that RM model is not generically 
better than KL for (very) thin plates. Nevertheless, an 
estimate by the same asymptotic bound as in (46) is valid 
for ut — CueP™, 


3.4 Higher order models 


The RM model is a (1, 1, 0) model with reduced elastic 
energy. For any q = (qr: 4r» 43) we define the space V4 
by (compare with (38) for monomial director functions) 


VZ) = fu eV(Q), 3z} e HY(S)*, O <n < qr, 
az e HS), O Sn <q 


qr q3 
u = Dx z% (xņ) and u= Ds zan] 
n=O n=0 

(48) 
The subspaces V,’ and Via of bending and membrane 
displacements in V4 can also be used, according to the 
nature of the data. The standard 3-D elastic energy (2) is 
used with V9 and V;! for any q > (1, 1, 2) and with Va for 
any q = (0, 9, 1). 


Theorem 3. 


G) Pff satisfies f|; # 0, for any q > (1, 1, 2) there exists 
a= Cy) > 0 such that for all e € (0, £9) 
Jue UF oy S CavE llt lng 9 
G) If f is membrane and |, # 0, for any q = (0,0, 1) 
there exists Cy = C,(f) > Osuch that for alle € (0, &9) 
(49) holds. 


Proof, Since the energy is not altered by the model, u®4 
is a Galerkin projection of u® on V4(Q*). Since the strain 
energy is uniformly equivalent to the elastic energy on any 
Q, we have by Céa’s lemma that there exists C > 0 

yva e V9(2%) 


juf — uP] < C|ju’ — v°I| 


E(Q*) — E(Q*) 


(i) We choose, compare with (25), 


Ny 2 
= —2 „—2 —2 
Vise? (ovr Stag ats ) 


7? 0X8 
- 0,§(R) 
D4 4u o(s) (0, §(R)) 


with ọ = A;t;*|,, and § a smooth cut-off function 
equal to 1 in a neighborhood of R = 0 and 0 for 
R > 1. Then v° satisfies the lateral boundary condi- 
tions and we can check (49) by combining Theorem 1 
with the use of Céa’s lemma. 
Gi) We choose, instead 

x hx 

v= (6. ET D= dive?) + co 


R 
TEE (s)(0, §(R)) 


with = dive4|,. 


It is worthwhile to mention that for the (1, 1,2) model 
the shear correction factor (when v > 0, Kq,1,3 tends to 
5/6, just like for the two shear correction factors of the RM 
model) 


12 — 2v 20v? 
e og hal hae ae 


can be used for optimal results in respect with the error in 
energy norm and deflection for finite thickness plates; see 
Babuška, d’Harcourt and Schwab (1991a). For higher plate 
models, no shear correction factor is furthermore needed. 

The result in Schwab and Wright (1995) regarding the 
approximability of the boundary layers by elements of 
V4, yields that the constant C, in (49) should rapidly 
decrease when q increases. Nevertheless the factor „/£ is 
still present, for any q, because of the presence of the 
boundary layer terms. The numerical experiments in Dauge 
and Yosibash (2000) demonstrate that the higher the degree 
of the hierarchical model, the better the boundary layer 
terms are approximated. : 

If one wants to have an approximation at a higher order 
in s one should 


e either consider a problem without boundary layer, as 
mentioned in requirement (c) (36), that is, a rectan- 
gular plate with symmetry boundary conditions: In 
this case, the convergence rate y(q) in e is at least 
min, qo 1, 

e or combine a hierarchy of models with a three- 
dimensional discretization of the boundary layer; see 
Stein and Ohnimus (1969) and Dauge and Schwab 
(2002). 


The (1, 1, 2) is the lowest order model which is asymp- 
totically consistent for bending. See Paumier and Raoult 
(1997) and Rossle et al. (1999). It is the first model in the 
bending model hierarchy 


1,2), (3,3,2), G3,4),.-- 
(2n—1,2n—1,2n), (2n+1,2n +1,2n),... 


The exponent y(q) in (36) can be proved to be 2n — 1 
if q= (2n —1,2n—1,2n) and 2n if q=(2n+1,2n+ 
1, 2n), thanks to the structure of the operator series V{e] 
and Q[e] in (11). If the load f is constant over the whole 
plate, then the model of degree (3, 3, 4) captures the whole 
regular part of u°, (Dauge and Schwab (2002), Rem. 8.3) 
and if, moreover, f = 0 (in this case, only a lateral boundary 
condition is imposed), the degree (3, 3, 2) is sufficient. 


3.5 Laminated plates 


If the plate is laminated, the material matrix A = A® has 
a sandwich structure, depending on the thickness variable 
X,: We assume that A*(x,) = A(X3), where the coeffi- 
cients of A are piecewise constant. In Nazarov (2000a) 
the asymptotic analysis is started, including such a sit- 
uation. We may presume that a full asymptotic expan- 
sion like (6) with a similar internal structure, is still 
valid. 

In the homogeneous case, the director functions in (38) 
are simply the monomials of increasing degrees; see (48). 
Jn the laminated case, the first director functions are still 1 
and X;: 


In the homogeneous case, we have © = x, and ©? = x3, 
j =1,2,3. In Actis, Szabo and Schwab (1999) three more 
piecewise linear director functions and three piecewise 
quadratic director functions are exhibited for the laminated 
case. 

How many independent director functions are necessary 
to increase the convergence rate y(q) (36)? In other words, 
what is the dimension of the spaces wi (cf. (37))? In our 
formalism, see (10)—(11), this question is equivalent to 
knowing the structure of the operators V’. Comparing with 
Nazarov (2000a), we can expect that 


Vig = (= X3 Vrta PI Xa) + P3 O) Obs + ati) 
+ P35(%)8;62) 


3 
Ve = (3 PP OOA + FM OG) 


EF PPO) a 

ži (50) 
As soon as the above functions P;* are independent, 
they should be present in the bases of the director space 
W; The dimensions of the spaces generated by the Be 
have upper bounds depending only on n. But their actual 
dimensions depend on the number of plies and their 
nature. 
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4 MULTISCALE EXPANSIONS AND 
LIMITING MODELS FOR SHELLS 


Up to now, the only available results concerning mul- 
tiscale expansions for ‘true’ shells concern the case of 
clamped elliptic shells investigated in Faou (2001a,b, 2003). 
For (physical) shallow shells, which are closer to plates 
than shells, multiscale expansions can also be proved; see 
Nazarov (2000a) and Andreoiu and Faou (2001). 

In this section, we describe the results for clamped elliptic 
shells, then present the main features of the classification 
of shells as flexural and membrane. As a matter of fact, 
multiscale expansions are known for the most extreme 
representatives of the two types: (i) plates for flexural 
shells, (ii) clamped elliptic shells for membrane shells. 
Nevertheless, multiscale expansions in the general case 
seem out of reach (or, in certain cases, even irrelevant) 
(see Chapter 3, Volume 2). 


4.1 Curvature of a midsurface and other 
important tensors 


We introduce minimal geometric tools, namely, the metric 
and curvature tensors of the midsurface S, the change of 
metric tensor Yap» and the change of curvature tensor Pyg- 
We also address the essential notions of elliptic, hyperbolic, 
or parabolic point in a surface. We make these notions more 
explicit for axisymmetric surfaces. A general introduction 
to differential geometry on surfaces can be found in Stoker 
(1969). 

Let us denote by (X,Y): the standard scalar product 
of two vectors X and Y in R°. Using the fact that the 
midsurface S is embedded in R?, we naturally define the 
metric tensor (a,g) as the projection on S of the standard 
scalar product in R?: Let pq be a point of S and X, Y, 
two tangent vectors to S in pq. In a coordinate system 
Xq = (x,) on S, the components of X and Y are (X*) and 
(¥“), respectively. Then the matrix (ag r) is the only 
positive definite symmetric 2 x 2 matrix such that for all 
such vectors X and Y 


(X, Y) = Gag (Kp) X" Y? =: (X, Y)s 


The inverse of a,g is written a°® and thus satisfies aa, = 
82, where 8& is the Kronecker symbol and where we 
used the repeated indices convention for the contraction 
of tensors. 

The covariant derivative D is associated with the metric 
ap as follows: It is the unique differential operator such 
that D(X, Y) s = (DX, Y)s + (X, DY )ş for all vector fields 
X and Y. In a local coordinate system, we have 


D, = 9, + terms of order 0 
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where ð, is the derivative with respect to the coordinate 
Xa: The terms of order 0 do depend on the choice of the 
coordinate system and on the type of the tensor field on 
which D is applied. They involve the Christoffel symbols 
of S in the coordinate system (x,). 

The principal curvatures at a given point p+ € S can be 
seen as follows: We consider the family P of planes P 
containing p- and orthogonal to the tangent plane to S$ at 
p7. For P € P, P N S defines a curve in P and we denote 
by k its signed curvature x. The sign of « is determined 
by the orientation of S. The principal curvatures x, and 
K3 are the minimum and maximum of k when P € P. The 
principal radii of curvature are R, := |k,;|~!. The Gaussian 
curvature of S in py is K (pq) = «,K. 

A point p+ is said to be elliptic if K (p+) > 0, hyperbolic 
if K(p;) <0, parabolic if K(p;) =0 but k; or k, is 
nonzero, and planar if x, =k, = 0. An elliptic shell is 
a shell whose midsurface is everywhere elliptic up to 
the boundary (similar definitions hold for hyperbolic and 
parabolic shells... and planar shells that are plates). 

The curvature tensor is defined as follows: Let Y: x; > 
W(x) be a parameterization of S in a neighborhood of a 
given point pq € S and n(W(x_)) be the normal to S$ in 
WY (x_). The formula 


ow 
bag = (MUO), Fb Oh 


defines, in the coordinate system Xq = (x,), the compo- 
nents of a covariant tensor field on S, which is called the 
curvature tensor. 

The metric tensor yields diffeomorphisms between tensor 
spaces of different types (covariant and contravariant): We 
have, for example, bj = a*°b,,. With these notations, we 
can show that in any coordinate system, the eigenvalues of 
bg at a point p- are the principal curvatures at Pr- 

In the special case where S is an axisymmetric surface 
parametrized by 


W: (Ky, X2) e (X; cos Xz, X sinxa, fX) ER? (51) 
where x, > 0 is the distance to the axis of symmetry, 


X2 € [0, 2x[ is the angle around the axis, and f:ReR 
a smooth function, we compute directly that 


f") 
(bg) = pee rey (as gs 
VT PR? ð F 


Xi 
whence 
EONS) 
x + FY 


A deformation pattern is a three-component field ¢ = 
Casta) where ¢, is a surface displacement on S, and 
$3 a function on S. The change of metric tensor Yop (6) 
associated with the deformation pattem ¢ has the following 
expression: 


Yap (S) = 3 (Daip + Dga) — Past (52) 
The change of curvature tensor associated with ¢ writes 


Pap(S) = DyDets — by bopts + byDat, + Dybgt, (53) 


4.2 Clamped elliptic shells 


The generic member 2° of our family of shells is defined 
as 


S x (8, £) 3 (Pr, X3) —> Pr +x, n(p7) € 2 c R? 

(54) 

where n(p_) is the normal to S at p+. Now three stretched 

variables are required (cf. Section 2.1 for plates): 

x r r 

3, R=- and T=— 

€ aie Je 

where (r, s) is a system of normal and tangential coordi- 

nates to 0S in S. 


X% = 


4.2.1 Three-dimensional expansion 


The solutions of the family of problems (3) have a three- 
scale asymptotic expansion in powers of 1/2, with regular 
terms v‘/?, boundary layer terms w*/? of scale e like 


for plates, and new boundary layer terms */? of scale 
ga 


Theorem 4. (Faou, 2003) For the solutions of problems 
(3), there exist regular terms v"/?(x.,, X3), k > 0, boundary 
layer terms */>(T, s, X;), k > 0 and w*/2(R, s, X3), k > 2, 
such that 


ue = (v9 + x 9) ENN + y Ql) + ev! + yo! 
+ yw) + (55) 


in the sense of asymptotic expansions: There holds the 
following estimates 


2K 
u? — y. ekl 2y ae yor? ae yw?) 
k=0 


a CeH T K=01... 


E(Q) 


Plates and Shells: Asymptotic Expansions and Hierarchic Models 213 


where we have set W? = w'/? = 0 and the constant Cr 
is independent of £ € (0, £g]. 


Like for plates, the terms of the expansion are linked with 
each other, and are generated by a series of deformation 
patterns ¢*/? = ¢*/2(x_) of the midsurface S. They solve 
a recursive system of equations, which can be written in a 
condensed form as an equality between formal series, like 
for plates. The distinction from plates is that, now, half- 
integer powers of € are involved and we write, for example, 
t[e!/?] for the formal series $2, e§/2¢*/?, 


4.2.2 Regular terms 


The regular terms series v[e!/?] = J, e*/*v*/? is deter- 
mined by an equation similar to (10): 


vfel?] = Vie Ife] + ale" e] 


(i) The formal series of deformation patterns ¢[e!/] 
starts with k = 0 (instead of degree —2 for plates). 
(ii) The first terms of the series V[e] are 


VWe=t, Vi? =0, 
Vig = (X; (Dabs + 2bhtg), PIX YEE) 
(56) 


where pt is the polynomial defined in (9), and the 
tensors D (covariant derivative), b (curvature), and 
y (change of metric) are introduced in Section 4.1: 
Even if the displacement V'¢ is given through its 
components in a local coordinate system, it indeed 
defines an intrinsic displacement, since D,, bg, and 
yg are well-defined independently of the choice of a 
local parameterization of the surface. Note that y2(¢) 
in (56) degenerates to div gy in the case of plates 
where bp = 0. More generally, for all integer k > 0, 
all ‘odd’ terms V*+!/? are zero and, if b= 0, all 
even terms V* degenerate to the operators in (11). 
In particular, their degrees are the same as in (11). 
(iii) The formal series Q[e'/?] appears as a generalization 
of (13) and f[e!/2] is the formal Taylor expansion of f 
around the midsurface x, = 0, which means that for 
all integer k > 0, f*+1/? = 0 and f* is given by (12). 


4.2.3 Membrane deformation patterns 


The first term ¢° solves the membrane equation 
g? e H} x HixL(S), Vt) eH) x Hi x L(S), 


as (69,0) =2 f, gt (57) 


where f° = fl; and ag m is the membrane form 


asn& t) =2 Í Ay EYES 58) 


with the reduced energy material tensor on the midsurface 
(with still given by (17)): 


Axbod = haha” ats (aaf E aq?) 


Problem (57) can be equivalently formulated as L°¢® = f° 
with Dirichlet boundary conditions ¢2 = 0 on ðS and is 
corresponding to the membrane equations on plates (com- 
pare with (19) and (22)). The operator L° is called mem- 
brane operator and, thus, the change of metric Yap () with 
respect to the deformation pattern ¢ appears to coincide with 
the membrane strain tensor, see Naghdi (1963) and Koi- 
ter (1970a). If b = 0, the third component of L°¢ vanishes 
while the surface part degenerates to the membrane opera- 
tor (20). In the general case, the properties of L? depends 
on the geometry of S: L° is elliptic (of multidegree (2, 2, 0) 
in the sense of Agmon, Douglis and Nirenberg, 1964) in x 
if and only if S is elliptic in x; see Ciarlet (2000), Genevey 
(1996), and Sanchez-Hubert and Sanchez-Palencia (1997). 

As in (21), the formal series ¢[e!/?] solves a reduced 
equation on the midsurface with formal series L[e!/?), 
R{e!/?}, f[e!/2] and d[e!/2], degenerating to the formal series 
(21) in the case of plates. 


4.2.4 Boundary layer terms 


Problem (57) cannot solve for the boundary conditions 
Glas = Int8las = O (see the first terms in (24)). The two- 
dimensional boundary layer terms @*/? compensate these 
nonzero traces: We have for k = 0. 


o = (0, 8(T,s)) with 3(,s) = —tlas 
and @,99(0,s) = 0 


For k = 1, the trace 3,¢91y5 is compensated by 3”: The 
scale ¢'/? arises from these surface boundary layer terms. 
More generally, the terms g*/? are polynomials of degree 
[k/2] in X, and satisfy 


le""@(T, s, X4)| bounded as T— 00 


for all n < (Gu + m) + 2n)-/2b,, (0, s)! where 
b,,(0, 8) > 0 is the tangential component of the curvature 
tensor along 0S. 

The three-dimensional boundary layer terms w*/? have 
a structure similar to the case of plates. The first nonzero 
term is w!. 


214 Plates and Shells: Asymptotic Expansions and Hierarchic Models 


4.2.5 The Koiter model 


Koiter (1960) proposed the solution z” of following surface 
problem 


Find z? € V,(S) such that 


Eas mZ, Z) +E as (22) = z f z-f°, Yz e Vg(S) 
: 7 sS 


(59) 
to be a good candidate for approximating the three- 
dimensional displacement by a two-dimensional one. Here 
the variational space is 


V,(S) := H} x Hi x HÈ(S) (60) 


and the bilinear form agp is the bending form: 


2 fx 
assz) = 5 f Fedo pads (6D 


Note that the operator underlying problem (59) has the form 
K(e) = eL° + £B where the membrane operator L’ is the 
same as in (57) and the bending operator B is associated 
with ay ,. Thus, the change of curvature tensor Pap appears 
to be identified with the bending strain tensor, Note that 
K(e) is associated with the two-dimensional energy (com- 
pare with (44)) 


PE 2: 3 na 
2e { Toy l2) dS + = i F p (2) Pa AS 
§ 


(62) 
For £ small enough, the operator K(e) is elliptic of multide- 
gree (2, 2, 4) and is associated with the Dirichlet conditions 
z= 0 and 4,Z,=0 on ðS. The solution z* of the Koi- 
ter model for the clamped case solves equivalently the 
equations 


(L?+e?B)z (x) = £°(x,) on S and Z'las = 0, 3nZ3las =0 

(63) 
This solution has also a multiscale expansion given by the 
following theorem. 


Theorem 5. (Faou, 2003) For the solutions of prob- 
lem (63), £ € (0, £g], there exist regular terms 2*/?(x-) and 
boundary layer terms *?(T, s), k = 0, such that 


Zee py p ears xy?) ela + xy) +e 
(64 


in the sense of asymptotic expansions: The following esti- 
mates hold 


2K 
ese etek? + yy?) 
k=0 


< Cy (f) gk +l/4 


£S 
K=0,14,... 


2 ai 
where Well, = IY zag) +P preg) md Ca is 
independent of £ € (0, £o]. 


The precise comparison between the terms in the expan- 
sions (55) and (64) shows that [1] ¢° = 2°, ¢¥? = 21/?, 
P =y, = Yi, while ¢! and z!, o3” and y3” are 
generically different, respectively. This allows obtaining 
optimal estimates in various norms: Considering the scaled 
domain 2 = S x (—1, 1), we have 

jue — z*Il s llu — 2° 


H'xH!xL2(Q) H!xH! xL?(Q) 


HIE -lameno SCO 

(65) 

This estimate implies the convergence result of Ciarlet 

and Lods (1996a) and improves the estimate in Mardare 

(1998a). To obtain an estimate in the energy norm, we 

need to reconstruct a 3-D displacement from 2° : First, the 

Kirchhoff-like [2] displacement associated with z? writes, 
cf. (56) 


UEP oz? = (zf — x, (Da + 20925), Z5) (66) 


and next, according to Koiter (1970a), we define the recon- 
structed quadratic displacement [3] 


1,1,0 
urz = UL z+ 


2 
(0 x3 Va (Z) + faw) 
(67) 


A +2yu 
Then there holds (compare with (46) for plates): 


ju — UK Z roy SEVEN cor) (69 


E(at) = 
and similar to plates, the error factor „/e is optimal and 
is due to the first boundary layer term w!. Moreover, 
expansion (64) allows proving that the classical models 
discussed in Budiansky and Sanders (1967), Naghdi (1963), 
Novozhilov (1959), and Koiter (1970a) have all the same 
convergence rate (68). 


4.3 Convergence results for general shells 


We still embed 2? in the family (54) with S the mid- 
surface of 9%. The fact that all the classical models are 
equivalent for clamped elliptic shells may not be true in 
more general cases, when the shell becomes sensitive (e.g. 
for a partially clamped elliptic shell with a free portion 
in its lateral surface) or produces bending effects (case of 
parabolic or hyperbolic shells with adequate lateral bound- 
ary conditions). ¢ 
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4.3.1 Surface membrane and bending energy 


Nevertheless, the Koiter model seems to keep good approx- 
imation properties with respect to the 3-D model. The 
variational space V, of the Koiter model is, in the totally 
clamped case given by the space V,(S) (60) (if the shell 
Q? is clamped only on the part yg x (—e, £) of its bound- 
ary (with Yọ C 88), the Dirichlet boundary conditions in the 
space V, have to be imposed only on yg). As already men- 
tioned (62), the Koiter model is associated with the bilinear 
form £as m + as» with ag m and agp the membrane and 
bending forms defined for z, z’ € V,(S) by (58) and (61) 
respectively. 

From the historical point of view, such a decomposi- 
tion into a membrane (or stretching) energy and a bending 
energy on the midsurface was first derived by Love (1944) 
in principal curvature coordinate systems, that is, for which 
the cnrvature tensor (bj) is diagonalized. The expression of 
the membrane energy proposed by Love is the same as (61), 
in contrast with the bending part for which the discussion 
was very controversial; see Budiansky and Sanders (1967), 
Novozhilov (1959), Koiter (1960), and Naghdi (1963) and 


the reference therein. Koiter (1960) gave the most natural - 


expression using intrinsic tensor representations: The Koi- 
ter bending energy only depends on the change of curvature 
tensor yg, in accordance with Bonnet theorem character- 
izing a surface by its metric and curvature tensors a,, and 
bag see Stoker (1969). 

For any geometry of the midsurface S, the Koiter model 
in its variational form (59) has a unique solution; see 
Bernadou and Ciarlet (1976). 


4.3.2 Classification of shells 


According to this principle each shell, in the zero thickness 
limit, concentrates its energy either in the bending surface 
energy as, (flexural shells) or in the membrane surface 
energy as m (membrane shells). 

The behavior of the shell depends on the ‘inextensional 
displacement’ space 


Ve(S) = {¢ € VCS)! Yap) = 0} (69) 


The key role played by this space is illustrated by the 
following fundamental result: 


Theorem 6. 


G) (Sanchez-Hubert and Sanchez-Palencia, 1997; Ciarlet, 
Lods and Miara, 1996) Let u° be the solution of 
problem (3). In the scaled domain 2 = S x (—1, 1), 
the displacement £?%u®{x-, X3) converges in H'!() as 
e — 0. Its limit is given by the solution Et e VECS) 


of the bending problem 
Ve EVES) ahne 70) 
S 


(ii) (Ciarlet and Lods, 1996b) Let z° be the solution of 
problem (59). Then £z? converges to ¢~? in V,(S) as 
e—> 0. 


A shell is said flexural (or noninhibited) when V,(S) is 
not reduced to {0}. Examples are provided by cylindrical 
shells (or portions of cones) clamped along their generatri- 
ces and free elsewhere. Of course, plates are flexural shelis 
according to the above definition since in that case, V.(S) 
is given by {¢ = (0, ¢3) |¢3 € H(S)} and the bending oper- 
ator (70) coincides with the operator (16). 

In the case of clamped elliptic shells, we have V.(S) = 
{0}. For these shells, uê and zë converge in H? x H! x L? 
to the solution £? of the membrane equation (57); see 
Ciarlet and Lods (1996a) and (65): Such shells are called 
membrane shells, The other shells for which V.(S) reduces 
to {0} are called generalized membrane shells (or inhibited 
shells) and for these also, a delicate functional analysis 
provides convergence results to a membrane solution in 
spaces with special norms depending on the geometry of 
the midsurface; see Ciarlet and Lods (1996a) and Ciarlet 
(2000), Ch. 5. It is also proved that the Koiter model 
converges in the same sense to the same limits; see Ciarlet 
(2000), Ch. 7. 

Thus, plates and elliptic shells represent extreme situa- 
tions: Plates are a pure bending structures with an inex- 
tensional displacement space as large as possible, while 
clamped elliptic shells represent a pure membrane situa- 
tion where V,(S) reduces to {0} and where the membrane 
operator is elliptic. 


4.4 Shallow shells 


We make a distinction between ‘physical’ shallow shells in 
the sense of Ciarlet and Paumier (1986) and ‘mathematical’ 
shallow shells in the sense of Pitkäranta, Matache and 
Schwab (2001). The former involves shells with a curvature 
tensor of the same order as the thickness, whereas the latter 
addresses a boundary value problem obtained by freezing 
coefficients of the Koiter problem at one point of a standard 
shell. 


4.4.1 Physical shallow shells 


Let R denote the smallest principal radius of curvature of 
the midsurface S and let D denote the diameter of S. As 
proved in Andreoiu and Faou (2001) if there holds 


R>2-D (71) 
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then there exists a point p+ € S such that the orthogonal 
projection of $ on its tangent plan in py allows the 
representation of S as a C® graph in R°: 


@ 2 (x1: X2) = (x1, Xp, O1, x2) = xX, ES CR? (72) 


where œ is an immersed (in particular, œ may have self- 
intersection) domain of the tangent plane in pq, and where 
© is a function on this surface. Moreover, we have 


}@|<cCR™ and ||\V@l<cRr (73) 


with constants C depending only on D. 
We say that Qf is a shallow shell if S satisfies a condition 
of the type 


R < Cd (74) 


where C does not depend on d. Thus, if S is a surface 
satisfying (74), for d sufficiently small S$ satisfies (71) 
whence representation (72). Moreover, (73) yields that © 
and VO are £ d. In these conditions, we can choose to 
embed Q into another family of thin domains than (54): 
We set 8 = d~!@ and define for any e € (0, d] the surface 
S° by its parameterization (cf. (72)) 


© D (%y,%q) > (Xp Xp, 8804, X2)) = Xq E SF 


It is natural to consider Q? as an element of the family 2° 
given as the image of the application 


© X (~E, £) D (Xy, X2, X3) > (xy, Xz, € OCX), x2)) 


+x n(x) (15) 


where n°(x_-) denotes the unit normal vector to the mid- 
surface S°. We are now in the framework of Ciarlet and 
Paumier (1986). 

A multiscale expansion for the solution of (3) is given in 
Andreoiu and Faou (2001). The expansion is close to that 
of plates, except that the membrane and bending operators 
yielding the deformation patterns are linked by lower- 
order terms: The associated membrane and bending strain 
components Yag and Peg are respectively given by 


Yop = $ (OqZ_ + pZ) — £44302; and Pyg = 9yg2Z5 
(76) 
It is worth noticing that the above strains are asymptotic 
approximations of the Koiter membrane and bending strains 
associated with the midsurface $ = $°. As a consequence, 
the Koiter model and the three-dimensional equations con- 
verge to the same Kirchhoff—Love limit. 


4.4.2 Mathematical shallow shells 


These models consist in freezing coefficients of standard 
two-dimensional models at a given point p+ € S in a prin- 
cipal curvature coordinate system, That procedure yields, 
with b; := k; (Pr): 


Yi = 912, — biza, Ya = 8z, — b23, 


Y2 = 481Z + 321) (77) 
for the membrane strain tensor, and 


Ki = 8725 +B ,8Z), Ka = 82, +b,0,2), 


Kız = 18923 + b, 092, + b20,2, (78) 


as a simplified version of the bending strain tensor. Such a 
localization procedure is considered as valid if the diameter 
D is small compared to R 


R>D (79) 


and for the case of cylindrical shells where the strains have 
already the form (77)—(78) in cylindrical coordinates (see 
equation (80) below). In contrast with the previous one, this 
notion of shallowness does not refer to the thickness. Here 
R is not small, but D is. Such objects are definitively shells 
and are not plate-like. 

These simplified models are valuable so to develop 
numerical approximation methods, (Havu and Pitkäranta, 
2002, 2003) and to find possible boundary layer length 
scales, (Pitkäranta, Matache and Schwab, 2001): These 
length scales (width of transition regions from the boundary 
into the interior) at a point py € 3S are e!/? in the nonde- 
generate case (bss (Pr) # 0), el/3 for hyperbolic degenera- 
tion (pq hyperbolic and b,,(p-) = 0) and e!/* for parabolic 
degeneration (pq parabolic and b,,(p_) = 0). 

To compare with the standard shell equations, note that 
in the case of an axisymmetric shell whose midsurface is 
represented by 


W: (X1, X2) > (f(x) cosx, f (xi) sinx, X1) 


where x, € R, x, € [0,2n[ and f(x,) > 0 is a smooth 
function, we have 


FOD” XD 2+ f” X) , 
LEF 1” AFIO 
fD X) z fx) 7 

T+ fo?! AFFE 


1 1% 


Yu (2) = 4,2, 


Yn (Z) = ðZ, + 


Z2 
(80) 
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Hence the equation (77) is exact for the case of cylindrical 
shells, where f(x,) = R > 0, and we can show that the 
same holds for (78). 


4.5 Versatility of Koiter model 


On any midsurface S, the deformation pattern z* solution of 
the Koiter model (59) exists. In general, the mean value of 
the displacement u® through the thickness converges to the 
same limit as zë? when ¢ -> 0 in a weak sense depending 
on the type of the midsurface and the boundary conditions; 
see Ciarlet (2000). Nevertheless, actual convergence results 
hold in energy norm when considering reconstructed dis- 
placement from the deformation pattern 2°. 


4.5.1 Convergence of the Koiter reconstructed 
displacement 


On any midsurface S, the three-dimensional Koiter recon- 
structed displacement Uki? is well-defined by (66)—(67). 
Let us set 


jut = UK? Z I geo 
e(S, £, Z, u°) := a O 1623) (81) 


E 
HA 


with |z| Ey the square root of the Koiter energy (62). 

In Koiter (1970a,b), an estimate is given: e(S,¢,2", 
u°)? would be bounded by eR7!+e7L~?, with R the 
smallest principal radius of curvature of S, and L the 
smallest wavelength of z®. It turns out that in the case 
of plates, we have L = O(1), R =O and, since (46) 
is optimal, the estimate fails. In contrast, in the case of 
clamped elliptic shells, we have L = O(./s), R7! = O0) 
and the estimate gives back (68). 

Two years after the publications of Koiter (1970a,b), it 
was already known that the above estimate does not hold 
as e — 0 for plates. We read in Koiter and Simmonds 
(1973) ‘The somewhat depressing conclusion for most 
shell problems is, similar to the earlier conclusions of 
GOL'DENWEIZER, that no better accuracy of the solutions 
can be expected than of order eL~! + eR71, even if the 
equations of first-approximation shell theory would permit, 
in principle, an accuracy of order e*L~? + eR.’ 

The reason for this is also explained by John (1971) 
in these terms: ‘Concentrating on the interior we sidestep 
all kinds of delicate questions, with an attendant gain in 
certainty and generality. The information about the inte- 
rior behavior can be obtained much more cheaply (in the 
mathematical sense) than that required for the discussion 
of boundary value problems, which form a more ‘transcen- 
dental’ stage’. 


Koiter’s tentative estimate comes from formal compu- 
tations also investigated by John (1971). The analysis by 
operator formal series introduced in Faou (2002) is in the 
same spirit: For any geometry of the midsurface, there exist 
formal series V[e], R[e], Q[e], and L{e] reducing the three- 
dimensional formal series problem to a two-dimensional 
problem of the form (21) with Lfe] =L’ + eL +- 
where L° is the membrane operator associated with the form 
(58). The bending operator B associated with as, can be 
compared to the operator L? appearing in the formal series 
L[e]: We have 


YE, E EVES) (L8, b's = (BE. Spas (82) 


Using this formal series analysis, the first two authors 
are working on the derivation of a sharp expression of 
e(S, e, z, u’) including boundary layers effects, and opti- 
mal in the case of plates and clamped elliptic shells; see 
Dauge and Faou (2004). 

Tn this direction also, Lods and Mardare (2002) prove the 
following estimate for totally clamped shells 


< CeM4 juti (83) 


1,1,2, if 
lu? ae (Uk z +w MM cass ad E) 


with w! an explicit boundary corrector of ul? 


4.5.2 Convergence of Koiter eigenvalues 


The operator e~'K(e) has a compact inverse, therefore its 
spectrum is discrete with only an accumulation point at 
+00. We agree to call Koiter eigenvalues the eigenvalues 
of the former operator, that is, the solutions p° of 


32° € Vg(S) \ {0} such that 


dg mZ, z) + 875,22) = aut fz .z, VZEV,(S) 


(84) 
As already mentioned in Section 2.4, cf. (31), this spec- 
trum provides the limiting behavior of three-dimensional 
eigenvalues for plates. Apparently, very little is known for 
general shells. 

Concerning Koiter eigenvalues, interesting results are 
provided by Sanchez-Hubert and Sanchez-Palencia (1997), 
Ch. X: The 1 are attracted by the spectrum of the mem- 
brane operator G(M) where M is the self-adjoint unbounded 
operator associated with the symmetric bilinear form as, m 
defined on the space H! x H? x L*(S). There holds (we 
still assnme that $ is smooth up to its boundary): 


Theorem 7. The operator M + p ld is elliptic of multide- 
gree (2, 2,0) for. > O large enough. Moreover its essential 
spectrum &,,(M) satisfies: 
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(i) If S is elliptic and clamped on its whole boundary, 
Ses (M) is a closed interval [a, b], with a > 0, 
Gi) IfS is elliptic and not clamped on its whole boundary, 
Sas (M) = {0} U [a, b] with a > 0, 
Gii) For any other type of shell (i.e. there exists at least one 
point where the Gaussian curvature is < 0) Sas M) is 
a closed interval of the form [0, b], with b > 0. 


if the shell is of flexural type, the lowest eigenvalues 
p, tend to 0 like O(e?), same as for plates; see (29). If the 
shell is clamped elliptic, the p, are bounded from below by 
a positive constant independent of £. In any other situation, 
we expect that the lowest p, still tends to 0 as £ — 0. 


5 HIERARCHICAL MODELS FOR 
SHELLS 


The idea of deriving hierarchical models for shells goes 
back to Vekua (1955, 1965, 1985) and corresponds to 
classical techniques in mechanics: Try to find asymptotic 
expansions in x, by use of Taylor expansion around the 
midsurface S$. An alternative approach consists in choosing 
the coefficients z; in (38) as moments through the thickness 
against Legendre polynomials Ł,„(x3/£). 

Vogelius and Babuška (1981a, 1981b, 1981c) laid the 
foundations of hierarchical models in view of their appli- 
cations to numerical analysis (for scalar problems). 


5.1 Hierarchies of semidiscrete subspaces 


The concepts mentioned in Section 3 can be adapted to the 
case of shells. In contrast with plates for which there exist 
convenient Cartesian system of coordinates fitting with the 
tangential and normal directions to the midsurface, more 
nonequivalent options are left open for shells. They are; 
for example; 


The direction of semidiscretization: The intrinsic choice is 
of course the normal direction to the midsurface (variable 
x3), nevertheless for shells represented by a single local 
chart like in (72), any transverse direction could be cho- 
sen. In the sequel, we only consider semidiscretizations 
in the normal direction. 


The presence or absence of privileged components for the 
displacement field in the Ansatz (38). If one privileged 
component is chosen, it is of course the normal one uz 
and the two other ones are (u,) = ur. Then the sequence 
of orders q is of the form q = (47,947,493), and the 
space V4(*) has the form (48). Note that this space 
is independent of the choice of local coordinates on S. 


If there is no privileged component, q has to be of the 
form (q, q, q) and the space V4(Q*) can be written 


VQ) = fu = (Wy, Ha, Ws) € VQ), 


3g" = (2, 23,25) € HiS, O<n<q, 


q 
w= > saaa), j=1,23} (85) 
n=0 


Here, for ease of use, we take Cartesian coordinates, 
but the above definition is independent of any choice of 
coordinates in 2° C RÌ. In particular, it coincides with 
the space (48) for qq = 43. 


Then the requirements of approximability (34), asymptotic 
consistency (35), and optimality of the convergence rate 
(36) make sense. 


5.2 Approximability 


For any fixed thickness g, the approximability issue is 
as in the case of plates. By Céa’s lemma, there exists 
an adimensional constant C > 0 depending only on the 
Poisson ratio v, such that 

Ju’ — u*4] < Clue — V°} vi € V(O) 


E(a8) E(2*) 


and the determination of approximability properties relies 
on the construction of a best approximation of uê by 
functions in V9(Q°), 

In Avalishvili and Gordeziani (2003), approximability is 
proved using the density of the sequence of spaces V9(92*) 
in H'(9*)3, But the problem of finding a rate for the 
convergence in (33) is more difficult, since the solution 
u® has singularities near the edges and, consequently, does 
not belong to H(Q*) in general. For scalar problems, 
Vogelius and Babuška (1981a, 1981b, 1981c) prove best 
approximation results using weighted Sobolev norms {4}. 
Up to now, for elasticity systems, there are no such results 
taking the actual regularity of u” into account. 

It is worth noticing that, in order to obtain an equal- 
ity of the form (36), we must use Korn inequality, since 
most approximation results are based on Sobolev norms. 
But due to blow up of the Korn constant when s — 0, it 
seems hard to obtain sharp estimates in the general case. 
(Let us recall that it behaves as e~! in the case of partially 
clamped shells.) 


5.3 Asymptotic consistency 


Like for plates, the presence of the nonpolynomial three- 
dimensional boundary layers w* generically produces a 
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limitation in the convergence rate in (35). As previously 
mentioned, the only case where a sharp estimate is avail- 
able, is the case of clamped elliptic shells. Using (68), we 
indeed obtain the following result (compare with Theo- 
rem 3): 


Theorem 8. If the midsurface S is elliptic, if the shell is 
clamped along its whole lateral boundary, and if t| 5 #0, 
then for any q > (1, 1, 2) with definition (48), and for any 
q > (2, 2, 2) with (85), and with the use of the standard 3-D 
elastic energy (2), there exists C, = C,(f) > 0 such that for 
alle € (0, £9) ‘ 


Iu US on SCQVE IM ay 88) 


Note that in this case, the two-dimensional boundary 
layers are polynomial in x, Therefore they can be captured 
by the semidiscrete hierarchy of spaces V%. 

Using estimate (83) of Lods and Mardare (2002), together 
with the fact that the corrector term w* is polynomial in x, 
of degree (0,0, 2), we obtain a proof for the asymptotic 
consistency for any (smooth) clamped shell without assum- 
ing that the midsurface is elliptic: 


Theorem 9. If the shell is clamped along its whole lateral 
boundary, and if tl s #9, then for q as in Theorem 8 and 
the standard energy (2), there exists Cy = C,y(f) > 0 such 
that for all e € (0, &) 


que —u'4y i < Cq E" url 


EGY 7 ~4 E(Q*) (87) 


5.4 Examples of hierarchical models 


Various models of degree (1, 1,0), (1, 1,2), and (2, 2, 2) 
are introduced and investigated in the literature. Note that 
the model (1, 1, 1) is strictly forbidden for shells because it 
cannot be associated with any correct energy; see Chapelle, 
Ferent and Bathe (2004). 


5.4.1 (1, 1,0) models 


One of the counterparts of Reissner—-Mindlin model for 
plates is given by the Naghdi model: see Naghdi (1963, 
1972). The space of admissible displacements is 


V(Q*) = {ue V(Q"), 3z € HCS, 30, € HECSP. 
u = (Z, — X3 (0, + 5825), Z3)} (88) 


As in (47) the energy splits into three parts (with the shear 
correction factor i): 


(u,v) = 26 [Ay serdreal2r) 65 
(membrane energy) 
+ ek Í pa"’(D,2, + biz, — 0a) 
S 


x (D,25 + bz, — 0,) dS (3 
(shear energy) 


28? f zabois = 
+5 [a Pap Z, 9) Pgs (2, 0) dS 


(bending energy) 
where 


Bag (2, 0) = $ (Dp Op + Dg0,)—Cyg23 + 3b3D,z, + 58D 42, 


Note that when the penalization term in the shear energy 
goes to zero, we get 8, = Dz; + bez, and the displacement 
u in (88) coincides with (66). In Lods and Mardare (2002), 
an estimate of the error between the solution of the Naghdi 
model and the solution of the 3-D model is provided in a 
subenergetic norm. 

A more recent (1, 1,0) model (called general shell ele- 
ment, see Chapelle and Bathe, 2000) consists of the reduced 
energy projection on the space V110 (Q). Indeed, it does 
not coincide with the Naghdi model but both models pos- 
sess similar asymptotic properties and they are preferred to 
Koiter’s for discretization. 


5.4.2 Quadratic kinematics 


In accordance with Theorems 8 and 9, it is relevant to use 
the standard 3-D elastic energy (2) for such kinematics. 
Quadratic models based on the (1, 1,2) model are inves- 
tigated in Bischoff and Ramm (2000). The enrichment of 
the general shell element by the introduction of quadratic 
terms — model (2, 2, 2) — is thoroughly studied from both 
asymptotic and numerical point views in Chapelle, Fer- 
ent and Bathe (2004) and Chapelle, Ferent and Le Tallec 
(2003). 


6 FINITE ELEMENT METHODS IN THIN 
DOMAINS 


We herein address some of the characteristics of finite 
element methods (FEM), mainly the p-version of the FEM, 
when applied to the primal weak formulations (3) and (33) 
for the solution of plate and shell models. We only address 
isotropic materials, although our analysis could be extended 
to laminated composites. 

As illustrative examples, we present the results of some 
computations performed with the p-version FE computer 
program StressCheck. (StressCheck is a trade mark of 
Engineering Software Research and Development, Inc., 
10845 Olive Blvd., Suite 170, St. Louis, MO 63141, USA.) 
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6.1 FEM discretizations 


Let us recall that, when conformal, the FEM is a Galerkin 
projection into finite dimensional subspaces Vy of the varia- 
tional space associated with the models under consideration. 
In the p-version of the FEM, subspaces are based on one 
partition of the domain into a finite number of subdomains 
K €T (the mesh) in which the unknown displacement is 
discretized by mapped polynomial functions of increasing 
degree p. The subdomains K are mapped from reference 
element(s) K. 


6.1.1 Meshes 


All finite element discretizations we consider here are based 
on a mesh TJ, of the midsurface S: We mean that the 
3-D mesh of 9° has in normal coordinates (x7, X3) the 
tensor product form [5] 7; ® Z, where Z, represents a 
partition of the interval (—e, £) in layers, for example, the 
two halves (—e,0) and (0, £), or — this case is important 
in the sequel — , the trivial partition by only one element 
through the thickness. We agree to call that latter mesh a 
thin element mesh. 

The 3-D elements K are thus images by maps Yg 
from reference elements K, which are either pentahedral 
(triangle x interval) or hexahedral: 


Wek =F x [0,115 $p $) x EK 


with T the reference triangle or the reference square. For 
the 2-D FEM, we denote by T the elements in T,: They 
are the image of T by maps Yy 


Ur TF > 2) x, ET 


If Q, is a plate, the midsurface S$ is plane but its 
boundary @S is not straight. For some lateral boundary 
conditions, for example, the hard, simple supported plate, 
the approximation of 3S by a polygonal lines produces, in 
general, wrong results. This effect is known as the Babuška 
paradox (Babuška and Pitkäranta, 1990). If 92, is a shell, 
the geometric approximation of S by ‘plane’ elements is 
also an issue: If the mappings are affine, the shell is 
approximated by a faceted surface which has quite different 
rigidity properties than the smooth surface; see Akian and 
Sanchez-Palencia (1992) and Chapelle and Bathe (2003), 
Section 6.2. 

As a conclusion, good mappings have to be used for the 


design of the elements K (high degree polynomials or other 
analytic functions). 


6.1.2 Polynomial spaces for hierarchical models 


For hierarchical models (33), the discretization is indeed 
two-dimensional: The degree q of the hierarchy being fixed, 
the unknowns of (33) are the functions z; defined on § and 
representing the displacement according to (38), where the 
director functions P; form adequate bases of polynomials 
in one variable, for example, Legendre polynomials L,,. 

We have already mentioned in Section 5 that the only 
intrinsic option for the choice of components is taking 
J = (a, 3), which results into the Ansatz (written here with 
Legendre polynomials) 


qr 


ur = zr) (2) 


n=0 
and 


us = Y za L, (2) 


n=O 


Now the discretization consists in requiring that 27], oy, 
a= 1,2, and 23|7 ° Wr belong to the space P; (T) for some 
p where P, (T) is the space of polynomials in two variables 


e of degree < p if T is the reference triangle, 
e of partial degree < p if T is the reference square 
[0, 1] x [0, 1]. 


It makes sense to fix different degrees p y; in relation with 
j =a, 3, and we set p = (p,, Pz, p3). When plugged back 
into formula (38), this discretization of the zj, j =o, 3, 
yields a finite dimensional subspace Vii (°) of VIRE). As 
already mentioned for the transverse degrees q, cf, (48) and 
Section 5, we have to assume for coherence that p, = p, 
for shells. In the situation of plates, if T is affinely mapped 
from the reference square, the Zjlr are simply given by 


Pl 
ZK) = YO zh Py) P(g) 
ik=0 


P2 
BK) = Y AnP) 
i,k=0 


Pa 
z3(xņ) = X z5 i P; 1) Pp) 
ik=0 


where the Zj,, are real coefficients and P, denotes a 
polynomial of degree i which is obtained from Legendre 
polynomials; see for example Szabó and Babuška (1991). 

The discretization of hierarchical models (33) can also 
be done through the -version or the h-p versions of FEM. 
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6.1.3 Polynomial spaces for 3-D discretization. 
Case of thin elements 


In 3-D, on the reference element R=-T x [0, 1], we can 
consider any of the polynomial spaces P, (K) = P,(T) 8 
P (l0, 1), p,q € N, For the discretization of (3), each 
Cartesian component u; of the displacement is sought 
for in the space of functions v € H’ (Q°) such that for 
any K in the mesh, v|y fy belongs to P, ,(K). We 
denote by V, (9) the corresponding space of admissible 
displacements over Që. 

In the situation where we have only one layer of elements 
over Q in the thickness (thin element mesh) with a (p,q) 
discretization, let us set q = (4, 4,4) and p = (p, p, Pp). 
Then it is easy to see that, in the framework of semidiscrete 
spaces (85), we have the equality between discrete spaces: 


Vp g) = V) (90) 


In other words, thin elements are equivalent to the dis- 
cretization of underlying hierarchical models. Let us insist 
on the following fact: For a true shell, the correspondence 
between the Cartesian components u; and the tangential 
and transverse components (U, U3) is nonaffine. As a con- 
sequence, equality (90) holds only if the space Vp (Q£) 
corresponds to the discretization of a hierarchical model 
in Cartesian coordinates. 

Conversely, hierarchical models of the type q = (q, 4,4) 
with the ‘Cartesian’ unknowns Bs na Onnin H 12,3 
can be discretized directly on S, or inherit a 3-D discretiza- 
tion; see Chapelle, Ferent and Bathe (2004). Numerical 
evidence that the p-version with anisotropic Ansatz spaces 
allows the analysis of three-dimensional shelis with high 
accuracy was firstly presented in Diister, Broker and Rank 
(2001). 


6.1.4 FEM variational formulations 


Let us fix the transverse degree q of the hierarchical model. 
Its solution u®4 solves problem (33). For each e > 0 and 
each polynomial degree p, (33) is discretized by its finite 
dimensional subspace Vp (9°). Let up’ be the solution of 


Find u? € Ve(Q*) such that 
P p 


aUuetuy= f fude, vu e VAQ*) (91) 
p z p 


We can say that (91) is a sort of 3-D discretization of (33). 
But, indeed, the actual unknowns of (91) are the zg, n = 
0,...,g7, and z3, n =0,..., q3, or the z} forn =0,....9 
and j = 1, 2,3. Thus, (91) can be alternatively formulated 
as a 2-D problem involving spaces Zp(S) independent of 
£, and a coercive bilinear form ase) polynomial in e. 


Examples are provided by the Reissner-Mindlin model, 
cf. (47), the Koiter model (84), and the Naghdi model, cf. 
(89). The variational formulation now takes the form 


Find Z =: 2"\ocneq € ZA(S) such that 
ad(e)(Z, 2’) = F(eX(f, Z), YZ e Ząa(S) (92) 


where F(e)(f,Z’) is the suitable bilinear form coupling 
loadings and test functions. Let us denote by Zp% the 
solution of (92). 


6.2 Locking issues 


In the framework of the family of discretizations consid- 
ered above, the locking effect is said to appear when a 
deterioration in the resulting approximation of u®* by us’, 
p — œ tends to 00, occurs as £ —> 0. Of course, a similar 
effect is reported in the -version of FEM: The deteriora- 
tion of the -approximation also occurs when the thickness 
£ approaches zero. 

Precise definition of locking may be found in Babuška 
and Suri (1992): It involves the locking parameter (the 
thickness £ in the case of plates), the sequence of finite 
element spaces Ve that comprise the extension procedure 
(the p-version in our case, but A and h-p versions can 
also be considered), and the norm in which error is to be 
measured. Of course, in different error measures different 
locking phenomena are expected. 


6.2.1 Introduction to membrane locking 


A locking-free approximation scheme is said to be robust. 
For a bilinear form a,(e) of the form ay + £7a, like Koi- 
ter’s, a necessary condition for the robustness of the approx- 
imation is that the intersections of the discrete spaces with 
the kernel of a) are a sequence of dense subspaces for 
the whole kernel of ag; see Sanchez-Hubert and Sanchez- 
Palencia (1997), Ch. XI. In the case of the Koiter model, 
this means that the whole inextensional space V-(S) (69) 
can be approximated by the subspaces of the inextensional 
elements belonging to FE spaces. For hyperbolic shells, 
the only inextensional elements belonging to FE spaces are 
zero; see Sanchez-Hubert and Sanchez-Palencia (1997) and 
Chapelle and Bathe (2003), Section 7.3, which prevents all 
approximation property of V,(S) if it is not reduced to {0}. 

This fact is an extreme and general manifestation of the 
membrane locking of shells, also addressed in Pitkaranta 
(1992) and Gerdes, Matache and Schwab (1998) for cylin- 
drical shells, which are a prototype of shells having a 
nonzero inextensional space. Plates do not present mem- 
brane locking since all elements z = (0, z4) are inexten- 
sional, thus can be approximated easily by finite element 
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subspaces. Nevertheless, as soon as the RM model is used, 
as can be seen from the structure of the energy (47), a shear 
locking may appear. 


6.2.2 Shear locking of the RM and hierarchical 
plate models 


Shear locking occurs because the FE approximation using 
C? polynomials for the RM family of plates at the limit 
when £ — 0 has to converge to the KL model in energy 
norm Suri, 2001, requiring C! continuity. Let us consider 
the three-field RM model on the subspace of V°™(Q'), 
cf. Section 3.3, of displacements with bending parity: {u € 
V(Q*), u = (—x,07, Z3)}. According to Suri, Babuška and 
Schwab (1995) we have the following: 


Theorem 10. The p-version of the FEM for the RM plate 
model without boundary layers, on a mesh of triangles 
and parallelograms, with polynomial degrees of pq = 1 for 
rotations 0 and p3 > pr for z; is free of locking in the 
energy norm. 


For the A-version over a uniform mesh consisting either 
of triangles or rectangles, to avoid locking the tangential 
degree pņ has to be taken equal to four or larger, with 
the transverse degree p, being chosen equal to p +1. A 
similar phenomenon was earlier found in connection with 
‘Poisson Ratio’ locking for the equations of elasticity. (i.e. 
conforming elements of degree four or higher encounter no 
locking); see Scott and Vogelius (1985). In Suri, Babuška 
and Schwab (1995), it is proven that locking effects (and 
results) for the (1,1,2) plate model are similar to the 
RM model because no additional constraints arise as the 
thickness £ > 0. Furthermore, it is stated that locking 
effects carry over to all hierarchical plate models. 

Here we have discussed locking in energy norm. How- 
ever, if shear stresses are of interest, then locking is signif- 
icantly worse because these involve an extra power et, 

For illustration purposes, consider a clamped plate with 
ellipsoidal midsurface of radii 10 and 5, Young mod- 
ulus (we recall that the Young modulus is given by 
E = 2+ 24)/2(4 + p) and the Poisson ratio by v= 
4/20, + p)) E = 1 and Poisson ratio v = 0.3; see Figure 4. 
The plate is loaded by a constant pressure of value (2e)?. 

The discretization is done over a 32 p-element mesh (see 
Figure 4(a) and (b) for 2e = 1 and 0.1) using two layers, 
each of dimension e in the vicinity of the boundary. The 
FE space is defined with p, = p+ ranging from 1 to 8. We 
show in Figure 5 the locking effects for the RM model with 
KEnergy- 

The error plotted in ordinates is the estimated relative 
discretization error in energy norm between the numerical 
and exact solution of the RM plate model for each fixed 


Figure 4. p-FE mesh for 2e = 1, 0.1 for RM model and 2e = 1 
for 3-D model. A color version of this image is available at 
http://www.mrw.interscience.wiley.com/ecm 
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Figure 5. Discretization error versus polynomial degree p for 
RM plates of various thicknesses £. A color version of this image 
is available at http://www .mrw.interscience.wiley.com/ecm 


thickness ¢ (it is not the error between the RM numerical 
solution and the exact 3-D plate model). A similar behavior 
can be observed with the model q = (1, 1, 2). 

To illustrate both the locking effects for the hierarchical 
family of plates and the modeling errors between the plate 
models and their 3-D counterpart, we have computed for 
two thicknesses of plates (2e = 1 or 2e = 0.01), the solution 
for the first four plate models (see Table 1 [6}), and for the 
fully 3-D plate with the degrees pq = p3 = 1,2,..., 8 with 
the model represented in Figure 4(c) for 2s = 1. 

The relative errors between energy norms of the hierar- 
chical models and the 3D plate model versus the polynomial 
degree p is shown in Figure 6. As predicted, increas- 
ing the order of the plate model does not improve the 
locking ratio, and as the hierarchical model number is 
increased the relative error decreases. We note that when 


Table 1. Hierarchical plate-model definitions for bending sym- 
metry. 


Model # 1 (RM) 2 3 4 


Degrees q = (91, 92,93) (1,1,0) (11,2) (3:3:2) (353,4) 
# independent 
fields d = (dı, d2, d3) (1,1,1) (1,1,2) (2,2,2) (2,2,3) 
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Figure 6. Relative error versus polynomial degree for 2e = 1 and 
0.01 for the first 4 hierarchical models. A color version of this 
image is available at http://www.mrw.interscience.wiley.com/ecm 


2e = 1 the relative error of the four models converges to 
the modeling error, which is still quite big since £ is not 
small, whereas when 2s = 0.01, the error stays larger that 
15% for all models when p <4, and starts converging 
for p > 5. 


6.3 Optimal mesh layout for hierarchical models 
with boundary layers 


All hierarchical plate models (besides KL model) exhibit 
boundary layers. These are rapidly varying components, 
which decay exponentially with respect to the stretched 
distance R = r/e from the edge, so that at a distance 
O©(2e) these are negligible. Finite element solutions should 
be able to capture these rapid changes. Using the p- 
version of the finite element method, one may realize 
exponential convergence rates if a proper design of meshes 
and selection of polynomial degrees is applied in the 
presence of boundary layers. 

In a 1-D problem with boundary layers, it has been 
proven in Schwab and Suri (1996) that the p-version over 
a refined mesh can achieve exponential convergence for 
the boundary layers, uniformly in e. The mesh has to be 
designed so to consist of one O(p(2e)) boundary layer 
element at each boundary point. More precisely, the optimal 
size of the element is ap(2e), where, 0 <a < 4/e (see 
Fig. 7). 

This result carries over to the heat transfer problem on 
2-D domains as shown in Schwab, Suri and Xenophontos 
(1998), and to the RM plate model, as demonstrated by 
numerical examples. Typical boundary layer meshes are 
shown in Figure 4 for 2e = 1 and 0.1: In practice, for ease 
of computations, two elements in the boundary layer zone 
are being used, each having the size in the normal direction 
of s, independent of the polynomial degree used. This, 


Figure 7. A typical design of the mesh near the boundary for the 
p-version of the FEM. 


although not optimal, still captures well the rapid changes 
in the boundary layer. 

In order to realize the influence of the mesh design 
over the capture of boundary layer effects, we have again 
solved numerically the RM plate model for a thickness of 
2e = 0.01 (and Kpeñection 28 Shear correction factor). Three 
different mesh layouts have been considered, with two 
layers of elements in the vicinity of the edge of dimension 
0.5, 0.05, and 0.005 (the first two ones are represented 
in Figure 4). For comparison purposes, we have computed 
the 3-D solution over a domain having two layers in the 
thickness direction and two elements in the boundary layer 
zone of dimension 0.005. We have extracted the vertical 
displacement u, and the shear strain e,, along the line 
starting at (x), x2) = (9.95, 0) and ending at the boundary 
(x1, X2) = (10, 0), that is, in the boundary layer region. 
Computations use the degrees py = p; = 8. It tums out 
that the vertical displacement u, is rather insensitive to 
the mesh, whereas the shear strain e,; is inadequately 
computed if the mesh is not properly designed: With the 
mesh containing fine layers of thickness 0.005, the average 
relative error is 10%, but this error reaches 100% with 
mesh layer thickness 0.05 and 400% for the mesh layer 
thickness 0.5. 

Concerning shells, we have seen in Section 4.2 that the 
Koiter model for clamped elliptic shells admits boundary 
layers of length scale ,/e, and in Section 4.4 that other 
length scales may appear for different geometries (e!/3 
and 1/4). Moreover, for Naghdi model, the short length 
scale g is also present; see Pitkäranta, Matache and Schwab 
(2001), Nevertheless, the “long” length scales e!/3 and 
e!/4 appear to be less frequent, We may expect a similar 
situation for other hierarchical models. As a conclusion the 
mesh design for shell of small thicknesses should (at least) 
take into account both length scales £ and ./e. Another 
phenomenon should also be considered: Hyperbolic and 
parabolic shells submitted to a concentrated load or a 
singular data are expected to propagate singularities along 
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their zero curvature lines, with the scale width £!⁄3; see 
Pitkäranta, Matache and Schwab (2001). 


6.4 Eigen-frequency computations 


Eigen-frequency computations are, in our opinion, a very 
good indicator of (i) the quality of computations, (ii) the 
nature of the shell (or plate) response. In particular, the 
bottom of the spectrum indicates the maximal possible 
stress—strain energy to be expected under a load of given 
potential energy. From Theorem 7, we may expect that, 
except in the case of clamped elliptic shells, the ratio 
between the energy of the response and the energy of the 
excitation will behave almost as O (e~?). 


6.4.1 Eigen-frequency of RM versus 3-D for plates 


Eigen-frequencies obtained by the p-version finite ele- 
ment method for clamped RM plates and their counterpart 
3-D eigen-frequencies have been compared in Dauge and 
Yosibash (2002), where rectangular plates of dimensions 
1x 2x 2e have been considered. For isotropic materials 


with Poisson coefficient v = 0.3, the relative error for the 
first three eigen-frequencies was found negligible (less than 
0.12%), for thin plates with slender ratio of less than 1%, 
and still small (0.2%) for moderately thick plates (slander 
ratio about 5%). 

For some orthotropic materials, much larger relative 
errors between the RM eigen-frequencies and their 3-D 
counterparts have been observed even for relatively thin 
plates. In one of the orthotropic rectangular plate examples 
in Dauge and Yosibash (2002), for which the boundary 
layer effect on the eigen-frequencies should be the most 
pronounced, a very large relative error of 25% has been 
reported for the first eigen-frequency at e = 0.1. This is 
a significant deviation, whereas the RM model underes- 
timates the ‘true’ 3-D by 25%, and is attributed to the 
boundary layer effect. 


6.4.2 3-D eigen-frequency computations for shells 


We present computations on three families of shells, see 
Figure 8: (a) clamped spherical shells, (b) sensitive spher- 
ical shells, (c) flexural cylindrical shells, all with material 


(a) (b) 


Figure 8. Shell models (a), (b) and (c) for e = 0.04. A color version of this image is available at http://www.mrw.interscience.wiley. 


com/ecm 


Figure 9. Model (a). vertical components of eigen-modes 1, 2 and 4 for e = 0.08. A color version of this image is available at 


http://www.mrw.interscience.wiley.com/ecm 
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parameters v = 0.3 and E = 1. These three families illus- 
trate the three cases (i), (ii) and (iii) in Theorem 7: The 
shells (a) are elliptic clamped on their whole boundary, (b) 
are elliptic, but clamped only on a part of their boundaries, 
and (c) are parabolic. Note that Theorem 7 states resulis 
relating to Koiter eigenvalues and not for 3-D eigenvalues. 
Nevertheless, a similar behavior can be expected for 3-D 
eigenvalues. 


Family (a). The midsurface S is the portion of the unit 
sphere described in spherical coordinates by 9 € [0, 2x) 
and 9 € (x/4, 1/2]. Thus S a spherical cap containing the 
north pole. The family of shells * has its upper and lower 
surfaces characterized by the same angular conditions, and 
the radii p = 1 +£ and p = 1 — s, respectively. We clamp 
Q along its lateral boundary 0 = 1/4. 

We have computed the first five eigen-frequencies of the 
3-D operator (4) by a FE p-discretization based on two 
layers of elements in the transverse direction and 8 x 5 
elements in the midsurface, including one thin layer of ele- 
ments in the boundary layer. The vertical (i.e. normal to the 
tangent plane at the north pole, not transverse to the mid- 
surface!) component u, for three modes are represented 
in Figure 9 for the (half)-thickness £ = 0.08. Mode 3 is 
rotated from mode 2, and mode 5 from mode 4 (dou- 
ble eigen-frequencies). The shapes of the eigen-modes for 
smaller values of the thickness are similar. Figure 10 pro- 
vides the three first distinct eigen-frequencies as a function 
of the thickness in natural scales. In accordance with 
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Figure 10. Model (a). Eigen-frequencies versus thickness (2e). 
A color version of this image is available at http://www.mrw.inter 
science.wiley.com/ecm 


Theorem 7 (i), the smallest eigen-frequencies all tend to 
the same nonzero limit, which should be the (square root 
of the) bottom of the membrane spectrum. 


Family (b). The midsurface S is the portion of the unit 
sphere described in spherical coordinates by 9 € [0, 27) 
and 0 € (1/4, 57/12]. The family of shells Q° has its upper 
and lower surfaces characterized by the same angular con- 
ditions, and the radii p = 1 + £ and p = 1 — g, respectively. 


j; 


Figure 11. Model (b). Vertical components of modes 1, 3, 5, 7, 8, 9 for e = 0.04, A color version of this image is available at 


http://www.mrw.interscience.wiley.com/ecm 
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Figure 12. Model (b). Vertical components of modes 1, 3, 5 for e= 0.00125. A color version of this image is available at 


http://www.mrw .interscience.wiley.com/ecm 


We clamp 2° along its lateral boundary @ = 57/12 and let 
it free along the other lateral boundary @ = 1/4. This shell 
is a sensitive one in the sense of Pitkäranta and Sanchez- 
Palencia (1997), which means that it is sensitive to the 
thickness and answers differently according to the value 
ofe, 

We have computed the first five (or first ten) eigen- 
frequencies of the 3-D operator (4) by a FE p-discretization 
similar to that of (a) (two layers in the transverse direction 
and 8 x 4 elements in the surface direction — for the ‘small’ 
thickness, a globally refined mesh of 16 x 6 elements has 
been used). In Figure 11, we plot the vertical components 
of modes number 1, 3, 5, 7, 8, and 9 for e = 0.04 and 
in Figure 12, modes number 1, 3, 5 for e = 0.00125. In 
both cases, modes 2, 4, and 6 are similar to modes 1, 3, 
and 5 respectively and associated with the same (double) 
eigen-frequencies. 

For s = 0,04, we notice the axisymmetric mode at posi- 
tion 7 (it is at position 5 when ¢ = 0.08, and 9 for = 0.02). 
Mode 8 looks odd. Indeed, it is very small (less than 1074) 
for normalized eigenvectors in © (1). This means that this 
mode is mainly supported in its tangential components (we 
have checked they have a reasonable size). Mode 8 is in fact 
a torsion mode, which means a dominant stretching effect, 
whereas the other ones have a more pronounced bending 
character. 

Figure 13 provides the first distinct eigen-frequencies 
classified by the nature of the eigenvector (namely the num- 
ber of nodal regions of u3) as a function of the thickness in 
natural scales. The organization of these eigen-frequencies 
along affine lines converging to positive limits as € > 0 
is remarkable. We may expect a convergence as £ — 0 of 
the solution uê of problem (3) provided the loading has 
a finite number of angular frequencies in (the displace- 
ment will converge to the highest angular frequency of 
the load). Nevertheless, such a phenomenon is specific to 
the axisymmetric nature of the shell (b) and could not be 
generalized to other sensitive shells. Computations with a 
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Figure 13. Model (b). Eigen-frequencies versus thickness (22). A 


color version of this image is available at http://www-.mrw.inter 


science.wiley.com/ecm 


concentrated load (which, of course, has an infinite number 
of angular frequencies) display a clearly nonconverging 
behavior (Chapelle and Bathe (2003), Section 4.5.3). 


Family (c). The midsurface S is a half-cylinder descri- 
bed in cylindrical coordinates (r, 6, y) by 0 € (0,2), r = 1 
and y € (—1, 1). The family of shells 2° has its upper 
and lower surfaces characterized by the same angular and 
axial condition, and the radii r=1+¢ and r=1l—s, 
respectively. We clamp Q* along its lateral boundaries 
@=0 and 9 = 7 and leave it free everywhere else. This 
is a well-known example of flexural shell, where the space 
of inextensional displacements contains the space, cf. (80) 
(note that, below, z, = z3) 


Veo = {2= (Z, Za Z); Z, = 0, z, =2,(0), Za = 2 (0) 
with 992) =z, and z,=Z,=%z,=0 in 6=0,7} 


(93) 
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Besides these patterns independent of the axial variable y, 
there is another subspace Vp, of inextensional displace- 
ments, where z, is independent on y and Z,, Zą are linear 
in y: 


Ve. i= {z = (Z,, Žo; z); z, = z,(), 
Zp = —YdqZ, (0), z, = —yd5z,(0) 
with 2, =2)=z, = az, =0 in 0=0,2} 


(94) 
and V; = Veg ® Vp,- We agree to call ‘constant’ the dis- 
placements associated with V-o and ‘linear’ those associ- 
ated with Vg 1- 

We have computed the first ten eigen-frequencies (4) by 
a FE p-discretization based on two layers of elements in 


the transverse direction and a midsurface mesh of 8 x 6 
curved quadrangles. For the half-thickness ¢ = 0.0025, we 
plot the vertical component u, = u, sin 9 + Uy cos @ of the 
eigenmodes u: In Figure 14, the first six constant flexural 
eigenmodes and in Figure 15, the first three linear flexural 
eigen-modes (their components u, clearly display a nonzero 
constant behavior in y). The shapes of the eigen-modes for 
larger values of the thickness are similar. In Figure 16, we 
have plotted in logarithmic scale these eigen-frequencies, 
classified according to the behavior of the flexural eigen- 
modes (‘constant’ and ‘linear’). The black line has the 
equation € ¢/4: Thus we can see that the slopes of 
the eigen-frequency lines are close to 1, as expected by 
the theory (at least for Koiter model). In Figure 17, we 
represent the first nonflexural modes (with rank 10 for 
e = 0.01 and rank 8, 9 for e = 0.04). 
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Figure 14. Model (c). Vertical components of modes 1, 2, 5, 6, 9 and 10 for e = 0.0025. A color version of this image is available at 


http://www.mrw.interscience.wiley.com/ecm 


Figure 15. Model (c). Vertical components modes 3, 4 and 7 for e= 0.0025. A color version of this image is available at 


http://www.mrw.interscience.wiley.com/ecm 
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Figure 16. Model (c). Eigen-frequencies versus e in log-log 
scale. A color version of this image is available at http://www. 
mrw.interscience. wiley.com/ecm 


6.4.3 Thin element eigen-frequency computations 


We present in Tables 2—4 the computation of the first 
eigen-frequency of the shell &Q° in families (a) and (b) 
for a moderate thickness (e = 0.02) and a small thickness 
(e = 0.00125) and for family (c) for a moderate thickness 
(e = 0.04) and a smali thickness (£ = 0.0025), respectively, 
for a moderate thickness (e = 0.04) and a small thickness 


(e = 0.0025). The degree q is the degree in the transverse 
direction (according to Section 6.1.3 there is one layer of 
elements), We notice that, for an accuracy of 0.01% and 
£ = 0.02, the quadratic kinematics is not sufficient, whereas 
it is for e = 0.00125. No locking is visible there. In fact, 
the convergence of the g-models to their own limits is more 
rapid for ¢ = 0.02. 


6.5 Conclusion 


It is worthwhile to point out that the most serious difficul- 
ties we have encountered in computing all these models 
occurred for £ = 0.00125 and model (b) — the sensitive 
shell: Indeed, in that case, when € > 0, the first eigen- 
mode is more and more oscillating, and the difficulties of 
approximation are those of a high-frequency analysis. It is 
also visible from Tables 3 and 4 that the computational 
effort is lower for the cylinder than for the sensitive shell, 
for an even better quality of approximation. 

It seems that, considering the high performance of 
the p-version approximation in a smooth midsurface (for 
each fixed ¢ and fixed degree q we have an expo- 
nential convergence in p), the locking effects can be 
equilibrated by slightly increasing the degree p as g 
decreases. 


Figure 17. Model (c). First nonfiexural modes for ¢=0.01 and ¢=0,04. A color version of this image is available at 


http://www.mew.interscience.wiley.com/ecm 


Table 2. Thin element computations for the first eigen-frequency of model (a). 


p ¢=0.02 and g=2 e=0.02 and g=3 e = 0.00125 and q =2 

DOF e-freq. % err. DOF e-freq. % err. DOF e-freq. % ext. 
1 297 0.2271659 37,967 396 0.2264908 37.557 29 

h p + 7 0.2055351 36.43 

2 729 0.1694894 2.938 828 0.1694269 2.900 729 0.1560694 3 eH 
A 1209 0.1652870 0.386 1308 0.1652544 0.366 1209 0.1537315 2.049 
; 2145 0.1648290 0.108 2244 0.1648001 0.090 2145 0.1517604 0.741 
A 3321 0.1646992 0.029 3636 0.1646693 0.011 3321 0.1508741 0.152 
5 4737 0.1646859 0.021 5268 0.1646555 0,002 4737 0.1506988 0.036 
4 6393 0.1646849 0.020 7140 0.1646544 0,002 6393 0.1506544 0.007 

8289 0.1646849 0.020 9252 0.1646543 0,002 8289 0.1506447 0.000 
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Table 3. Thin element computations for the first eigen-frequency of model (b). 


P e=0.02 and g=2 e=0.02 and g=3 e = 0.00125 and q=2 
DOF e-freq, % err. DOF e-freq. % err. DOF e-freq. % er. 

1 864 0.0597700 89.68 1152 0.0595287 88.91 864 0.0462144" 932.2 

2 2016 0.0326855 373 2304 0.0326036 3.46 2016 0.0129819 189.9 

3 3168 0.0318094 0.95 3456 0.0317325 0.70 3168 0.0064504 44.06 
4 5472 0.0316330 0.39 5760 0.0315684 0.18 5472 0.0047030 5.04 
5 8352 0.0316071 0.30 9216 0.0315319 0.06 8352 0.0045085 0.69 
6 11808 0.0316011 0.28 13 248 0.0315223 0.03 11 808 0.0044800 0.06 
7 15 840 0.03 16000 0.28 17856 0.0315200 0.03 15840 0.0044780 0.01 
8 20448 0.0315998 0.28 23.040 0,0315195 0.03 20448 0.0044779 0.01 


Table 4. Thin element computations for the first eigen-frequency of model (c). 


Pp e=0.04 and g=2 e= 0.04 and q =3 ¢=0.0025 and q =2 
DOF e-freq, % er. DOF e-freq. % err. DOF e-freq. % err. 

1 567 0.0514951 210.2 756 0.0510683 208.7 567 0.0397025 3666. 

z 1331 0.0207290 24.9 1500 0.0206911 24.7 1311 0.0079356 653.1 

3 2055 0.0167879 12, 2244 0.0167596 0.98 2055 0.0011505 9.188 

4 3531 0.0166354 0.02 3720 0.0166091 0.08 3531 0,0010578 0.395 

5 5367 0.0166293 0.02 5928 0.0166011 0.03 5367 0.0010548 0.108 

6 7563 0.0166289 0.02 8496 0.0166004 0.02 7563 0.0010541 0.045 

1 10119 0.0166288 0.02 11424 0.0166003 0.02 10119 0.0010538 0.012 

8 13035 0.0166288 0.02 14712 0.0166002 0.02 13035 0.0010537 0.002 


Of course, there exist many strategies to overcome Jock- 
ing in different situations: Let us quote here (Bathe and 
Brezzi, 1985; Brezzi, Bathe and Fortin, 1989; Arnold and 
Brezzi, 1997) as ʻearly references’, on mixed methods, 
which result in a relaxation of the zero-membrane-energy 
constraint. These methods are addressed in other chapters 
of the Encyclopedia. 
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NOTES 


[1] We have a similar situation with plates, where the 
solution u®*+ of the Kirchhoff—Love model gives back 
the first generating terms on the asymptotics of uf, cf. 
Theorem 2. 


[2] The actual Kirchhoff—Love displacement (satisfying 
€;3 = 0) is slightly different, containing an extra quad- 
ratic surface term. 

[3] The complementing operator © defined in (45) for 
plates satisfies cuki = ue. 

[4] These norms are those of the domains of the frac- 
tional powers A* of the Sturm—Liouville operator 
A:¢ > 3, ((1 — x*)8,¢) on the interval (—1, 1). Such 
an approach is now a standard tool in the p-version 
analysis. 

[5] Of course, different mesh designs are possible on 

thin domains. If one wants to capture boundary layer 

terms with an exponential rate of convergence, a h-p 

refinement should be implemented near the edges of 

9°, Dauge and Schwab (2002). 

Here, for ease of presentation, we use the numbering 

system for plate models displayed in Table 1, where we 

also provide the number d; of fields in each direction 
for bending models, that is, for which the surface com- 
ponents are odd and the normal component even in x3. 


[6 


oa 
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1 INTRODUCTION 


Finite element method is a well-known and highly effec- 
tive technique for the computation of approximate solu- 
tions of complex boundary value problems. Started in the 
fifties with milestone papers in a structural engineering 
context (see e.g. references in Chapter 1 of Zienkiewicz 
and Taylor (2000a) as well as classical teferences such 
as Turner et al. (1956) and Clough (1965)), the method 
has been extensively developed and studied in the last 
50 years (Bathe, 1996; Brezzi and Fortin, 1991; Becker, 
Carey and Oden, 1981; Brenner and Scott, 1994; Crisfield, 
1986; Hughes, 1987; Johnson, 1992; Ottosen and Petersson, 
1992; Quarteroni and Valli, 1994; Reddy, 1993; Wait and 
Mitchell, 1985) and it is currently used also for the solution 
of complex nonlinear problems (Bathe, 1996; Bonet and 
Wood, 1997: Belytschko, Liu and Moran, 2000; Crisfield, 
1991; Crisfield, 1997; Simo and Hughes, 1998; Simo, 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


1999; Zienkiewicz and Taylor, 2000b; Zienkiewicz and 
Taylor, 2000c). 

Within such a broad approximation method, we focus on 
the often-called mixed finite element methods, where in our 
terminology the word ‘mixed’ indicates the fact that the 
problem discretization typically results in a linear algebraic 
system of the general form 


BNIB o 
B OjLY g 

with A and B matrices and with x, y, f, and g vectors. Also, 
on mixed finite elements, the bibliography is quite large, 
ranging from classical contributions (Atluri, Gallagher and 
Zienkiewicz, 1983; Carey and Oden, 1983; Strang and Fix, 
1973; Zienkiewicz et al., 1983) to more recent references 
(Bathe, 1996; Belytschko, Liu and Moran, 2000; Bonet 
and Wood, 1997; Brezzi and Fortin, 1991; Hughes, 1987; 
Zienkiewicz and Taylor, 2000a; Zienkiewicz and Taylor, 
2000c). An impressive amount of work has been devoted to 
a number of different stabilization techniques, virtually for 
all applications in which mixed formulations are involved. 
Their treatment is, however, beyond the scope of this 
chapter, and we will just say a few words on the general 
idea in Section 4.2.5. 

In particular, the chapter is organized as follows. Sec- 
tion 2 sketches out the fact that several physical problem 
formulations share the same algebraic structure (1), once a 
discretization is introduced. Section 3 presents a simple, 
algebraic version of the abstract theory that rules most 
applications of mixed finite element methods. Section 4 
gives several examples of efficient mixed finite element 
methods. Finally, in Section 5 we give some hints on how 
to perform a stability and error analysis, focusing on a 
representative problem (i.e. the Stokes equations). 
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2 FORMULATIONS 


The goal of the present section is to point out that a quite 
large set of physical problem formulations shares the same 
algebraic structure (1), once a discretization is introduced. 

To limit the discussion, we focus on steady state field 
problems defined in a domain Q C R4, with d the Euclidean 
space dimension. Moreover, we start from the simplest class 
of physical problems, that is, the one associated to diffusion 
mechanisms. Classical problems falling in this frame and 
frequently encountered in engineering are heat conduction, 
distribution of electrical or magnetic potentials, irrotational 
flow of ideal fluids, torsion or bending of cylindrical beams. 

After addressing the thermal diffusion, as representative 
of the whole class, we move to more complex problems, 
such as the steady state flow of an incompressible New- 
tonian fluid and the mechanics of elastic bodies. For each 
problem, we briefly describe the local differential equations 
and possible variational formulations. 

Before proceeding, we need to comment on the adopted 
notation. In general, we indicate scalar fields with nonboid 
lower-case roman or nonbold lower-case greek letters (such 
as a, a, b, $B), vector fields with bold lower-case roman 
letters (such as a, b), second-order tensors with bold lower- 
case greek letters or bold upper-case roman letters (such as 
a, B, A, B), fourth-order tensors with upper-case blackboard 
roman letters (such as D). We however reserve the letters 
A and B for ‘composite’ matrices (see e.g. equation (31)). 
Moreover, we indicate with 0 the null vector, with I the 
identity second-order tensor and with I the identity fourth- 
order tensor. 

Whenever necessary or useful, we may use the standard 
indicial notation to represent vectors or tensors, Accord- 
ingly, in a Euclidean space with base vectors e,, a vector a, 
a second-order tensor œ, and a fourth-order tensor D have 
the following components 


aj, =4,=a-e,, al, = ay =e; - (xej) 
Dlie = Diju = & 8 e;) : De, 8 e)] (2) 


where -, @, and : indicate respectively the scalar vector 
product, the (second-order) tensorial vector product, and 
the scalar (second-order) tensor product. Sometimes, the 
scalar vector product will be also indicated as ab, where 
the superscript T indicates transposition. 

During the discussion, we also introduce standard differ- 
ential operators such as gradient and divergence, indicated 
respectively as ‘V’ and ‘div’, and acting either on scalar, 
vector, or tensor fields. In particular, we have 


Val; =4;, Val;; = 4; ; 


diva =a; diva; = Ojj (3) 


where repeated subscript indices imply summation and 
where the subscript comma indicates derivation, that is, 
a; = da/dx;. 

Finally, given for example, a scalar field a, a vector 
field a, and a tensor field æ, we indicate with ða, 8a, da 
the corresponding variation fields and with a”, a’, o:* the 
corresponding interpolations, expressed in general as 


a= Ngô, a= Ngô, ot = N%&, (4), 


where Nf, NÌ, and Nf are a set of interpolation func- 
tions (i.e. the so-called shape functions), while â, and 
&, are a set of interpolation parameters (i.e. the so-called 
degrees of freedom); clearly, Ny, Nj, and Ng are respec- 
tively scalar, vector, and tensor predefined (assigned) fields, 
while â, and â, are scalar quantities, representing the 
effective unknowns of the approximated problems. With 
the adopted notation, it is now simple to evaluate the 
differential operators (3) on the interpolated fields, that 
is, 


Va" = (VNf) â, Va" = (VN?) 4, 
diva" = (divN?)4,, diva = (divNZ)&, (5) 


or in indicial notation 


h a h a 
Va"|; =NZi&,, Valy = Noli jae 


diva" = NGI, jd,  diva”h = NElij jêr (6) 


2.1 Thermal diffusion 


The physical problem 

Indicating with 9 the body temperature, e the temperature 
gradient, q the heat flux, and with b the assigned heat source 
per unit volume, a steady state thermal problem in a domain 
Q can be formulated as a (0, e, q) three field problem as 
follows: 


q= -De in Q (7) 


faze in Q 
e= V8 in Q 


which are respectively the balance equation, the constitutive 
equation, the compatibility equation. 

In particular, we assume a linear constitutive equa- 
tion (known as Fourier law), where D is the conductivity 
material-dependent second-order tensor; in the simple case 
of thermally isotropic material, D = kI with k the isotropic 
thermal conductivity. y 

Equation (7) is completed by proper boundary condi- 
tions. For simplicity, we consider only the case of trivial 


essential conditions on the whole domain boundary, that is, 
@=0 on dQ (8) 


This position is clearly very restrictive from a physical 
point of view but it is still adopted since it simplifies the 
forthcoming discussion, at the same time without limiting 
our numerical considerations. 

As classically done, the three field problem (7) can be 
simplified eliminating the temperature gradient e, obtaining 
a (8, q) two field problem 


divgtb=0 in Q 
(oo in Q (9) 


and the two field problem (9) can be further simplified 
eliminating the thermal flux q (or eliminating the fields e 
and q directly from equation (7)), obtaining a 9 single field 
problem 


~—div(DV8) +b=0 in Q (10) 


For the case of an isotropic and homogeneous body, this 
last equation specializes as follows 


—-kAG+b=0 in Q (11) 
where A is the standard Laplace operator. 


Variational principles 
The single field equation (10) can be easily derived starting 
from the potential energy functional 


1 
(6) = off [ve - Dve] an f Əbd (12) 
2 Jo g 
Requiring the stationarity of potential (12), we obtain 


dT1(8)[86] = [ {(V86) - DV6] dQ +Í [89b] dR = 0 
2 2 a3) 

where 8@ indicates a possible variation of the temperature 
field @ and dI1(6)[86] indicates the potential variation 
evaluated at @ in the direction 86. Since functional (12) is 
convex, we may note that the stationarity requirement is 
equivalent to a minimization. 

Recalling equation (4), we may now introduce an inter- 
polation for the temperature field in the form 


0 ~ OF = NPB, (14) 


as well as a similar approximation for the corresponding 
variation field, such that equation (13) can be rewritten in 
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matricial form as follows 


Ab=f (15) 
with 
aa 8 6: Al —6 
Alij = fiva, -DYN?]aQ, ĝl =ô, 
(16) 


fi =- Í [NPB] de 


Besides the integral form (13) associated to the single field 
equation (10), it is also possible to associate an integral 
form to the two field equation (9) starting now from the 
more general Hellinger—Reissner functional 


1 
TER 6, --5[ Do a- f - VO] dQ 
@=~3 j [a Da] pla ve} 
+ f Əb dQ (17) 
Q 
Requiring the stationarity of functional (17), we obtain 
dTT*®(6, pial = — Í [3q-D~'q] aQ 
Q 
-f [Sq - V8] d2 = 0 
s a8) 
dri™® (0, q)[80] = — [ [(V88) - q} d2 
Q 


30b] dQ = 0 
+ fi ] 


which is now equivalent to the search of a saddle point. 
Changing sign to both equations and introducing the 
approximation 


œ ot = Nn% 
aie = Nf6, (19) 


q~ qh = Nea: 


as well as a similar approximation for the corresponding 
variation fields, equation (18) can be rewritten in matricial 


form as follows 
A BT]/4] _ [0 
[b lhk 


Al, = [INT Dna, 4, =4, 


where 


Bl- = [wwe -Niao ôi, =ô, 21) 


gl, = [ [Nesa 
Q 
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Starting from the Hellinger—Reissner functional (17) previ- 
ously addressed, the following modified Hellinger—Reissner 
functional can be also generated 


m 1 € 3 
neeo, y=- f [a Da] ant f [odiva] ag 


ob 
+ Í, dQ (22) 


and, requiring its stationarity, we obtain 


anmo, Ig = — | [ig D 
@, DIa fban q] ag 
+ [ [avon 6] d2=0 


dR, q)[80] = di = 
(8, q)[80] [pe vq] a2+ f 1300) dQ =0 


(23) 
which is again equivalent to the search of a saddle point. 
Changing sign to both equations and introducing again 
field approximation (19), equation (23) can be rewritten in 
matricial form as equation (20), with the difference that 
now 


B|; =— 1 [ne div (xi) | dQ 24) 


Similarly, we may also associate an integral form to the 
three field equation (7) starting from the even more general 
Hu—Washizu functional 


1 x 
MWO, e=; f Te- De ag+ f [a - (e — V8)] da 
Q Q 


+ f Ob dQ (25) 
2 
Requiring the stationarity of functional (25), we obtain 


di! (6, e, q)[Se] = 1 [e - De] d2 
õe- q] d2 =0 
+ L [èe - q} dQ 
dr (6, e, ofsq] = [ [ba e€- VÐ] dQ=0 6) 
dri™™ (6, e, q)[86] = — Í [(V80) - q] da 


+f [89b] dR = 0 
Q 


which is equivalent to searching a saddle point. Introducing 
the following approximation 


o ~ 0" = NP6, 
ea e = NPB, (27) 
a~ q" = Niâ, 


as well as a similar approximation for the corresponding 
variation fields, equation (26) can be rewritten in matricial 
form as follows: 


A BT 0]fê 0 
B 0 effa- fo] (28) 
oc oflé h 


Aly = [ IN}-DNj]40, è= 


where 


BI, = f [VNf Nyaa, al, = 4 


A 5 (29) 
Cler = -f VN; Ni] dQ, êl, = 9, 
Q 


h, =- f [wep] da 
Q 


For later considerations, we note that equation (28) can be 


also rewritten as 
A B'l{x 0 
b ola] æ 


where we made the following simple identifications 


A BT 
TF al B= {0,C} 


eji]. y=6 G1) 


Examples of specific choices for the interpolating func- 
tions (14), (19), or (27) respectively within the single field, 
two field, and three field formulations can be found in 
standard textbooks (Bathe, 1996; Ottosen and Petersson, 
1992; Brezzi and Fortin, 1991; Hughes, 1987; Zienkiewicz 
and Taylor, 2000a) or in the literature. 


2.2 Stokes equations 


The physical problem 

Indicating with u the fluid velocity, e the symmetric part 
of the velocity gradient, o the stress, p a pressure-like 
quantity, and with b the assigned body load per unit volume, 


the steady state flow of an incompressible Newtonian fluid 
can be formulated as a (u, €, 0, p) four field problem as 
follows: 


divo+b=0 in 
o=2pe—pl nQ 
e= Vou in Q 
divu = 0 in Q 


(32) 


which are respectively the balance, the constitutive, the 
compatibility, and the incompressibility constraint equa- 
tions. In particular, V° indicates the symmetric part of the 
gradient, that is, in a more explicit form, - 


e=Vu= 3 [Vu + (Vu)"] (33) 


while the constitutive equation relates the stress ø to the 
symmetric part of the velocity gradient € through a material 
constant u known as viscosity, and a volumetric pressure- 
like scalar contribution p. 

This set of equations is completed by proper boundary 
conditions. As for the thermal problem, we prescribe trivial 
essential conditions on the whole domain boundary, that is, 


u=0 on Q (34) 


As classically done, equation (32) can be simplified elimi- 
nating © and ø, obtaining a (u, p) two field problem 


otal dchiaais in Q 


35 
divu = 0 in Q ea 


Variational principles 
Equation (35) can be derived starting from the potential 
energy functional 


(aw) = su f vo : Vu] dQ — L Íb - u] dQ (36) 


where now u is a function satisfying the constraint, that is, 
such that div u = 0. 

To remove the constraint on u, we can modify the 
variational principle introducing the functional 


La, p) = zu f Vu: Vm ag- | tb-u] ag 


- f [pivu] dQ (37) 


where p now plays the role of Lagrange multiplier. 
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Requiring the stationarity of functional (37), we obtain 


dL(u, p)[u] = p [ (Vôu) : Vu] dQ — [om +b] dQ 
= Í [div (8u) x] dQ = 0 
Q 


dL, pièp]=- f [spdivu] a2 =0 


(38) 
which is equivalent to the search of a saddle point. Intro- 
ducing the following approximation 


[ ATR (39) 


pe pi = Np Py 


as well as a similar approximation for the corresponding 
variation fields, equation (38) can be rewritten as follows 


[s olihi] e% 


Aly =u | [vnr vns] dQ, a, =a, 


where 


Bj; = - f [x div(Nt)|4@, fp=ô, aD 


fl; = f [N+] dQ 


Examples of specific choices for the interpolating func- 
tions (39) can be found in standard textbooks (Bathe, 
1996; Brezzi and Fortin, 1991; Hughes, 1987; Quarteroni 
and Valli, 1994; Zienkiewicz and Taylor, 2000a) or in the 
literature. 


2.3 Elasticity 


The physical problem 

Indicating with u the body displacement, € the strain, o 
the stress, and with b the assigned body load per unit 
volume, the steady state equations for a deformable solid 
under the assumption of small displacement gradients can 
be formulated as a (u, £, 6) three field problem as follows 


o = De in Q (42) 


[ome in Q 
e= Vou in 2 


which are respectively the balance, the constitutive, and the 
compatibility equations. 
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In particular, we assume a linear constitutive equation, 
where D is the elastic material-dependent fourth-order ten- 
sor; in the simple case of a mechanically isotropic material, 
D specializes as 


D=2p14+%AQ@I (43) 
and the constitutive equation can be rewritten as 
o =2pe +A tr(e) I (44) 


where tr(e) =I: e. This set of equations is completed 
by proper boundary conditions. As previously done, we 
prescribe trivial essential conditions on the whole domain 
boundary, that is, 


u=0 on 02 (45) 


This position is once more very restrictive from a physical 
point of view but it is still adopted since it simplifies the 
forthcoming discussion, at the same time without limiting 
our numerical considerations. 

The three field problem (42) can be simplified eliminat- 
ing the strain e, obtaining a (u, o) two field problem 


divo +b=0 mQ 


o = DV'u in Q 66) 


and the two field problem (46) can be simplified eliminating 
the stress ø (or eliminating e and o directly from equa- 
tion (42)), obtaining a u single field problem 

div (DV‘u) +b=0 in Q (47) 


In the case of an isotropic and homogeneous body, this last 
equation specializes as follows: 


2p div (Vu) +2>V(divu)+b=0 in Q (48) 
Variational principles 


The single field equation (47) can be easily derived starting 
from the potential energy functional 


nu) = l Í [Vou : DV'u] dQ — [ {b-u] dQ (49) 
2 Ja Q 
Requiring the stationarity of potential (49), we obtain 
dI1(u) [Su] = Í ((V°8u) : DV*u] d2 
Q 
-f [u - b] dQ = 0 (50) 
Q 


where ŝu indicates a possible variation of the displacement 
field u. Since functional (49) is convex, we may note that 


the stationarity requirement is equivalent to a minimization. 
Recalling the notation introduced in equation (4), we may 
now introduce an interpolation for the displacement field in 
the form 


u xut = Ntô, 6D 


as well as a similar approximation for the variation field, 
such that equation (50) can be rewritten as follows: 


Aû =f (52) 


where 


>: 


fl, = [Inte] d2 


Al = f [nt puny] de, fil, =2, 
Q 


Besides the integral form (50) associated to the single field 
equation (47), it is also possible to associate an integral 
form to the two field equation (46) starting now from the 
more general Hellinger~Reissner functional 


TEAR (uy, o) = -f [o : D70] dQ +f [o : Vu] dQ 
2 Jo Q 
- [ [b-u] d2 (54) 
Q 
Requiring the stationarity of functional (54), we obtain 
dT? (u, 0) [80] = -f [bo : D'o] dQ 
2 
+f [o : Vu] dR = 0 
: 65) 


dE (u, 6) [Su] = [ [(V8u) : o] d2 
Q 


- f (eu-b} a2 =0 
Q 


which is now equivalent to the search of a saddle point. 
Changing sign to both equations and introducing the 
approximation 


ur ut = Nya, 
ja ~ of = Ngô, (38) 


as well as a similar approximation for the corresponding 
variation fields, equation (55) can be rewritten in matricial 


form as follows 7 
A B'){é 0 
b lihi e 
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where 


Aly = [N:D Nja, 61, =4, 
BI, = -fon :N}dQ, ai 
gl, =- f int-bjan 
2 
Starting from equation (54) the following modified Hellin- 
ger—Reissner functional can be also generated 
1 
TER (yo) = mi [o : D0] da -f [u- divo] dQ 
2 2 


2 Í Ib- u] dQ (59) 
Q 
and, requiring its stationarity, we obtain 


dT" (a, 6)[30] = — [ [bo : Do] da 
Q 
-f [div (80) - u] dQ = 0 
2 
drF®"(q, [Su] = — Í [ðu - divo] an 
Q 


- f buwag =0 
2 


(60) 
which is again equivalent to the search of a saddle point. 
Changing sign to both equations and introducing again field 
approximation (56), equation (60) can rewritten in matricial 
form as equation (57), with the difference that now 


BL, = f [Ne div (x7) | dQ 61) 


Similarly, we may also associate an integral form to three 
field equation (42) starting from the even more general 
Hu-—Washizu functional 


mu, e,o) => f fe: De d2- f fo: (e-V'u)] dQ 


-f [b u] d2 (62) 
QR 


Requiring the stationarity of functional (62), we obtain 
dant” (u, e, 0) [Se] = Í [Se : De] dQ 
R 
- f be:o1aQ=0 
Q 
all (u, e, 0) [80] = -f [So : (e — V°u)] dQ = 0 
2 
dT (u, e, 0) [Su] = [ [(V*8u) : o] da 
Q 


- f tu: blag =0 
Q 


(63) 
which is again equivalent to the search of a saddle point. 
Introducing the following approximation 


e ~ e*t = Nié, (64) 
o ~ ot = Ngô, 


[Eza cn 


as well as a similar approximation for the variation fields, 
eqnation (63) can be rewritten as follows 


A BT 0 0 
B 0 Œ ={0 (65) 
0c 0 h 


Aly = [ [Non] dQ, a, =ê; 


D Q 


where 


BI =- [NNi] de, ôl = 
i (66) 
Clsr = [ [VNS : Np] dQ, Gl, =ô 
hl, = f [Ntb] ag 


For later consideration, we note that equation (65) can be 


rewritten as 
A BT){x 0 
belih  ® 


where we made the following simple identifications 


A BT 
TF ap B = (0,C} 


o 


x= {5}. y=i (68) 


Examples of specific choices for the interpolating func- 
tions (51), (56), or (64) respectively within the single field, 
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the two field, and the three field formulations can be found 
in standard textbooks (Bathe, 1996; Brezzi and Fortin, 
1991; Hughes, 1987; Zienkiewicz and Taylor, 2000a) or 
in the literature. 


Toward incompressible elasticity 

It is interesting to observe that the strain e, the stress o and 
the symmetric gradient of the displacement V°u can be 
easily decomposed respectively in a deviatoric (traceless) 
part and a volumetric (trace-related) part. In particular, 
recalling that we indicate with d the Euclidean space 
dimension, we may set 


8 
sade with 0 = tr(e) 
ir 
o =s+pl with p = ae 
Vu = Vu+ II with div@) = © (Vu) 


(69) 
where 9, p, and div(u) are the volumetric (trace-related) 
quantities, while e, s, and V°u are the deviatoric (or 
traceless) quantities, that is, 


tre) = tr(s) = t (Vu) =0 (70) 


Adopting these deviatoric-volumetric decompositions and 
limiting the discussion for simplicity of notation to the 
case of an isotropic material, the three field Hu—Washizu 
functional (62) can be rewritten as 


THQ, e, 0,8, p) = ; i [2ne:e+k67}dQ 
Q 
- fs: €- Vaag- f ipe- div u)] dQ 
Q Q À 


- [m-wae (71) 
Q 


where we introduce the bulk modulus k = i+ 2p/d. If 
we now require a strong (pointwise) satisfaction of the 
deviatoric compatibility condition e = V°u (obtained from 
the stationarity of functional (71) with respect to s) as 
well as a strong (pointwise) satisfaction of the volumetric 
constitutive equation p = kô (obtained from the stationarity 
of functional(71) with respect to 6), we end up with the 
following simpler modified Hellinger—Reissner functional 


HR, m =; MW: Ys ae 1 2 
Il (up) = 5 Peyi: Vu] de 5 igre 
+ [ traivuye ~ f -wac (72) 
Q Q 


It is interesting to observe that taking the variation of func- 
tional (72) with respect to p, we obtain the correct relation 


between the pressure p and the volumetric component of 
the displacement gradient, that is, 


2 
p=kaiva = (n+ 51) divu (73) 


For the case of incompressibility (à —> oo and k —> 00), 
functional (72) reduces to the following form 


1 aai 
HR,m et Sn US 4 
mnu, p) = z [uy u:y Wa + f tpaivujas 
- [tb- waa (74) 
Q 


which resembles the potential energy functional (49) for 
the case of an isotropic material with the addition of 
the incompressibility constraint divu = 0 and with the 
difference that the quadratic term now involves only the 
deviatoric part of the symmetric displacement gradient and 
not the whole symmetric displacement gradient. 

Requiring the stationarity of functional (74), we obtain 


di"R= (u, p)[Su] = i [2u Vòu : Vu] dQ 
Q 
+ f taiv cu) plds.~ f Bu- b} da =0 
2 2 


dR, p)[3p] = f [sp divu]d2 = 0 
Q 


(75) 
Introducing the following approximation 
we yt Ley 
u ~u’ = Nd, 76 
| p © p* = Np fr a 


as well as a similar approximation for the corresponding 
variation fields, equation (75) can be rewritten as follows 


O 


where 


Bl, = [ [xe div (nv) | d2, p =f, (18) 


a, = f N» da 


It is interesting to observe that this approach may result in 
an unstable discrete formulation since the volumetric com- 
ponents of the symmetric part of the displacement gradient 
may not be controlled, Examples of specific choices for the 


interpolating functions (76) can be found in standard text- 
books (Hughes, 1987; Zienkiewicz and Taylor, 2000a) or 
in the literature. 

A different stable formulation can be easily obtained as in 
the case of Stokes problem. In particular, we may start from 
the potential energy functional (49), which for an isotropic 
material specializes as 


TI) =; { [2u (Vou: V'u) + (divu)?] da 


= Í [b - u] dQ (79) 
a 


Introducing now the pressure-like field x = à div u, we can 
rewrite functional (79) as 


1 
TI” (u, x) = ; Í, g (V'u : V'u) — zr] dQ 
i — : 80) 
+ f bediva) do [ow u] dQ (80) 


We may note that n is a pressure-like quantity, different, 
however, from the physical pressure p, previously intro- 
duced. In fact, x is the Lagrangian multiplier associated 
to the incompressibility constraint and it can related to the 
physical pressure p recalling relation (73) 


ee 
p=kdivu = x + zp divu (81) 


For the incompressible case (à —> 00), functional (80) red- 
uces to the following form: 


n” (u, x) = 5 [uve : Vu] da 
i — b- uj dQ 82 
+ f txdivul as fi u) (82) 


Taking the variation of (82) and introducing the following 
approximation 


fe ~ u = Nta, (83) 


navn = Nii, 


as well as a similar approximation for the corresponding 
variation fields, we obtain a discrete problem of the fol- 


lowing form: 
A BT)[a f 
b Slb a 
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where 
Aly = 2 | [vnt VN] a2, al) =a, 
Bi, = f [v div(N#)]d2, @,=%, 69 


n= f [Nt] ao 


Tt is interesting to observe that, in general, this approach 
results in a stable discrete formulation since the volumetric 
components of the symmetric part of the displacement 
gradient are now controlled. Examples of specific choices 
for the interpolating functions (83) can be found in standard 
textbooks (Bathe, 1996; Brezzi and Fortin, 1991; Hughes, 
1987; Zienkiewicz and Taylor, 2000a) or in the literature. 


Enhanced strain formulation 

Starting from the work of Simo and Rifai (1990), recently, 
a lot of attention has been paid to the so-called enhanced 
strain formulation, which can be variationally deduced for 
example, from the Hu—Washizu formulation (62). As a 
first step, the method describes the strain e as the sum 
of a compatible contribution, V'u, and of an incompatible 
contribution, &, that is, 


e=Vut+é (86) 


Using this position into the Hu-Washizu formulation (62), 
we obtain the following functional 


(a, 8, 0) = = [rad Dvw] a0 
HA 


- f 10:8] a2- f b-a ag (87) 


Requiring the stationarity of the functional and introducing 
the following approximation 


a" = NEE (88) 


as well as a similar approximation for the variation fields, 
we obtain the following discrete problem: 


A BT 0 f 
B c pt = 410 (89) 
0 D 0 0 
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where 


(90) 
For later consideration, we note that equation (89) can be 


rewritten as 
A B'l{x d 
wab 


where we made the following simple identifications: 


= {3}. a={j}. yee (92) 


Examples of specific choices for the interpolating functions 
can be found in standard textbooks (Zienkiewicz and Tay- 
lor, 2000a) or in the literature. 

However, the most widely adopted enhanced strain for- 
mulation also requires the incompatible part of the strain to 
be orthogonal to the stress o 


:€] dQ=0 93 
fie | (93) 


If we use conditions (86) and (93) into the Hu—Washizu 
formulation (62), we obtain the following simplified func- 
tional: 


T1™(u, €) = 1 [ [(v'u +ë) : D(V'u + 8)] da 
2 Ja 


= Í [b u] d2 (04) 
Q 


which closely resembles a standard displacement-based 
incompatible approach. Examples of specific choices for the 
interpolating functions involved in this simplified enhanced 
formulation can be found in standard textbooks 
(Zienkiewicz and Taylor, 2000a) or in the literature. 


3 STABILITY OF SADDLE-POINTS IN 
FINITE DIMENSIONS 


3.1 Solvability and stability 


The examples discussed in Section 2 clearly show that, after 
discretization, several formulations typically lead to linear 
algebraic systems of the general form 


[s elki it = 


where A and B are respectively an n x n matrix and an 
mxn matrix, while x and y are respectively an n x-i 
vector and m x 1 vector, as well as f and g. Discretizations 
leading to such a system are often indicated as mixed finite 
element methods and in the following, we present a simple, 
algebraic version of the abstract theory that rules most 
applications of mixed methods. 

Our first need is clearly to express in proper form 
solvability conditions for linear systems of type (95) in 
terms of the properties of the matrices A and B. By 
solvability we mean that for every right-hand side f and 
g system, (95) has a unique solution. It is well known that 
this property holds if and only if the (n +m) x (n +m) 


matrix 
T 
m so 


is nonsingular, that is, if and only if its determinant is 
different from zero. 

In order to have a good numerical method, however, 
solvability is not enough. An additional property that we 
also require is stability. We want to see this property 
with a little more detail. For a solvable finite-dimensional 
linear system, we always have continuous dependence of 
the solution upon the data. This means that there exists 
a constant ¢ such that for every set of vectors x, y, f, g 
satisfying (95) we have 


ixl + iyl < clei + igid (97) 


This property implies solvability. Indeed, if we assume 
that (97) holds for every set of vectors x, y, f, g satisfy- 
ing (95), then, whenever f and g are both zero, x and y 
must also be equal to zero. This is another way of saying 
that the homogeneous system has only the trivial solution, 
which implies that the determinant of the matrix (96) is 
different from zero, and hence the system is solvable. 
However, Formula (97) deserves another very important 
comment. Actually, we did not specify the norms adopted 
for x,y, f,g. We had the right to do so, since in finite 


dimension all norms are equivalent. Hence, the change of 
one norm with another would only result in a change of the 
numerical value of the constant c, but it would not change 
the basic fact that such a constant exists. However, in 
dealing with linear systems resulting from the discretization 
of a partial differential equation we face a slightly different 
situation. In fact, if we want to analyze the behaviour of 
a given method when the meshsize becomes smaller and 
smaller, we must ideally consider a sequence of linear 
systems whose dimension increases and approaches infinity 
when the meshsize tends to zero. As it is well known (and 
it can be also easily verified), the constants involved in the 
equivalence of different norms depend on the dimension of 
the space. For instance, in R”, the two norms 


n n 1472 
(xl = ix; and xl = be i) (98) 
i=} i=l 


are indeed equivalent, in the sense that there exist two 
positive constants c, and c, such that 


Cy [xl < lxil < exl (99) 


for all x in R”, However, it can be rather easily checked 
that the best constants one can choose in (99) are 


Ixll < Isli < Vallstl, (100) 


In particular, the first inequality becomes an equality, for 
instance, when x, is equal to 1 and all the other x,’s are 
zero, while the second inequality becomes an equality, for 
instance, when all the x, are equal to 1. 

When considering a sequence of problems with increas- 
ing dimension, we have to take into account that n and m 
become unbounded. It is then natural to ask if, for a given 
choice of the norms ||x||, yll, |lf||, and |lg||, it is possible to 
find a constant c independent of the meshsize (say, A), that 
is, a constant c that makes (97) hold true for all meshsizes. 

However, even if inequality (97) holds with a constant 
c independent of h, it will not provide a good concept of 
stability unless the four norms are properly chosen (see 
Remark 18). This is going to be our next task. 


3.2 Assumptions on the norms 


We start denoting by X, Y, F, G respectively the spaces of 
vectors x, y, f, g. Then, we assume what follows. 


1. The spaces X and Y are equipped with norms || - ||x 
and || - ly for which the matrices A and B satisfy the 
continuity conditions: there exist two constants M, and 
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M, independent of the meshsize, such that for all x and 
z in X and for all y in Y 


x"Az < M,|Ixllxlizlly and x"BTy < My|Ixllyllylly 

(101) 
Moreover, we suppose there exist symmetric positive 
definite matrices M* and MY, respectively of dimen- 
sions n x n and m x m, such that 


Ix =x™M*x vxeX (102) 
and 
ly? =y"M’y vyeyY (103) 


2. The spaces F and G are equipped with norms || « ||; and 
ll - lig defined as the dual norms of ||- ||x and Il- lly, 
that is, 


lfl p = lglg := su 


sup xf p J8 
xex\(0} [IXlly yero llyly 
(104) 
It is worth noting that 


e assumptions (102) to (103) mean that the norms for 
X and Y are both induced by an inner product or, in 
other words, the norms at hand are hilbertian (as it 
happens in most of the applications); 

e for every x and f in R” and for every y and g in R”, 


we have 
xf < Iixlixlifle and -y"g< llyllyligllg 
(105) 
e combining the continuity condition (101) on J|- |ly 


and ||- ||» with the dual norm definition (105), for 
every x€ X and for every y € Y, we have the fol- 
lowing relations: 


zt Ax 
—— < M,\ixilx (106) 


Axle = su < 
F = xto [ally 
z' Bx 
{Bxllg = sup ——<M,|Ixllx (107) 
zeY\ {0} lzily 
z'Bly 
BTyl|,= sup ———<M,llylly (108) 
zeX\(0} HlZIlx 


ə if A is symmetric and positive semidefinite, then for 
every x,Z E€ X 


jz? Ax| < ("A22 TAW)! (109) 
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so that (106) can be improved to 


TAX 
lAxlp < : 


sup < MP TAx)? (110) 
zex\(o} IZ] x 


We are now ready to introduce a precise definition of 
stability. i 


Stability definition. Given a numerical method, that pro- 
duces a sequence of matrices A and B when applied to a 
given sequence of meshes (with the meshsize h going to 
zero), we choose norms ||-|ly and ||- |ly that satisfy the 
continuity condition (101), and dual norms || - || p and || - Ilg 
according to (104). Then, we say that the method is sta- 
ble if there exists a constant c, independent of the mesh, 
such that for all vectors x, y, f, g satisfying the general sys- 
tem (95), it holds 


Ixllx + llylly < cClflly + lglg) (111) 


Having now a precise definition of stability, we can look 
for suitable assumptions on the matrices A and B that may 
provide the stability result (111). In particular, to guar- 
antee stability condition (111), we need to introduce two 
assumptions involving such matrices. The first assumption, 
the so-called inf—sup condition, involves only the matrix B 
and it will be used throughout the whole section. To illus- 
trate the second assumption we will first focus on a simpler 
but less general case that involves a ‘strong’ requirement on 
the matrix A. Among the problems presented in Section 2, 
this requirement is verified in practice only for the Stokes 
problem. Then, we shall tackle a more complex and clearly 
more general case, corresponding to a ‘weak’ requirement 
on the matrix A, suited for instance for discretizations of 
the mixed formulations of thermal diffusion problems. 

Later on we shall deal with some additional complica- 
tions that occur for instance, in the (u, 1)-formulation of 
nearly incompressible elasticity (cf. (80)). Finally, we shall 
briefly discuss more complicated problems, omitting the 
proofs for simplicity. 


3.3 A requirement on the B matrix: the inf-sup 
condition 


The basic assumption that we are going to use, throughout 
the whole section, deals with the matrix B. We assume the 
following: i 


Inf-sup condition. There exists a positive constant B, 
independent of the meshsize h, such that: 


VyeY axe X\ (0} such that x™B’y > Ailxiyllylly 
(112) 


Condition (112) requires the existence of a positive con- 
stant B, independent of h, such that for every y € Y we can 
find a suitable x € X, different from 0 (and depending on 
y), such that (112) holds. 


Remark 1. To better understand the meaning of (112), it 
might be useful to see when it fails. We thus consider the 
following m x n pseudodiagonal matrix (m < n) 


Dp O r a £ gra 

0 % 0 eats 2 
Beja rs : (113) 

0 * + O®% OF ve 0 


with 0 < 3, <8, < = < Úm < 1. To fix ideas, we suppose 
that both X = R” and Y = R” are equipped with the stan- 
dard Euclidean norms, which coincide with the correspond- 
ing dual norms on F and G (cf. (104)). If 0, = 0, choosing 
y= (1,0,...,0)' 40, we have BTy =0. Therefore, for 
every x € X, we get x'B'y = 0 and condition (112) cannot 
hold since B must be positive. We then infer that condi- 
tion (112) requires that 


no y £0 satisfies B'y = 0 
which, by definition, means that BT is injective. However, 
the injectivity of BY is not sufficient for the fulfillment of 
condition (112). Indeed, for 0< 0, <0, <--- <0, <1, 
the matrix BT is injective and we have, still choosing 
y= (1,0,...,0)%, 
By = (,,0,...,0)7 £0 (114) 
Since for every x € X it holds 
xT Bly = 8x) < 0 lill = 94 1/Xllxllylly (115) 
we obtain that the constant $ in (112) is forced to satisfy 
O<p<0, (116) 
As a consequence, if 3, > 0 tends to zero with the meshsize 
h, the matrix BT is still injective but condition (112) fails, 
because B, on top of being positive, must be independent 


of h. Noting that (see (114)) 


|BTy 
llyly 


=% (117) 


we then deduce that condition (112) requires that for y # 0 


the vector B'y is not ‘too small’ with respect to y 


which is a property stronger than the injectivity of the 
matrix BT. We will see in Proposition 1 that all these 
considerations on the particular matrix B in (113) does 
extend to the general case. 


We now rewrite condition (112) in different equivalent 
forms, which will also make clear the reason why it is 
called inf—sup condition. 

Since, by assumption, x is different from zero, condi- 
tion (112) can equivalently be written as 

TPT. 
SEY > biylly 
ixllx 
(118) 
This last form (118) highlights that given y € Y, the most 
suitable x € X is the one that makes the left-hand side 
of (118) as big as possible. Hence, the best we can do is 
to take the supremum of the left-hand side, when x varies 
among all possible x € X different from 0. Hence, we may 


VyeY dx € X\{0} such that 


. equivalently require that 


TBT 
vyeY sup ~—*>Blyly 019 
xex\(o} Ixl 


In a sense, we got rid of the task of choosing x. However, 
condition (119) still depends on y and it clearly holds for 
y = 0. Therefore, we can concentrate on the y’s that are 
different from 0; in particular, for y #0 condition (119) 
can be also written as 
TPT, 
sup EBY. >ß (120) 
xeX\{0) Istlxllyliy 


The worst possible y is therefore the one that makes the left- 
hand side of (120) as small as possible. If we want (120) to 
hold for every y € Y we might as well consider the worst 
case, looking directly at the infimum of the left-hand side 
of (120) among all possible y’s, requiring that 


x'Bly 
inf sup ———— >8 (121) 
yeo xoxo} Xl ll Ly 


The advantage of formulation (121), if any, is that we got 
rid of the dependency on y as well. Indeed, condition (121) 
is now a condition on the matrix B, on the spaces X and 
Y (together with their norms) as well as on the crucial 
constant B. 

Let us see now the relationship of the inf—sup condition 
with a basic property of the matrix B. 


Proposition 1. The inf—sup condition (112) is equivalent 
to require that 


Blyly < IBTyle VyeY¥ (122) 
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Therefore, in particular, the inf-sup condition implies that 
the matrix B7 is injective. 


Proof, Assume that the inf~sup condition (112) holds, and 
let y be any vector in Y. By the equivalent form (119) and 
using definition (104) of the dual norm || - Ilp, we have 


TRT, 

Bliyly < sup <2 = B'yi 023) 

xex\(o) IIXIlx 

and therefore (122) holds true. Moreover, the matrix B! is 

injective since (122) shows that y # 0 implies Bly £0. 
Assume conversely that (122) holds. Using again the 

definition (104) of the dual norm J] - || -, we have 


T. 
x Bly (124) 


< |B" = su 
Blylly < IB yll Bk Ixix 


which implies the inf—sup condition in the form (119). O 


Remark 2. Whenever the m xn matrix B satisfies the 
inf-sup condition, the injectivity of BT implies that n > m. 
We point out once again (cf. Remark 1) that the injectivity 
of BT is not sufficient for the fulfillment of the inf—sup 
condition. 


Additional relationships between the inf—sup and other 
properties of the matrix B will be presented later on in 
Section 3.5. 


3.4 A ‘strong’ condition on the A matrix. 
Ellipticity on the whole space — Stokes 


As we shall see in the sequel, the inf—sup condition is a 
necessary condition for having stability of problems of the 
general form (95). In order to have sufficient conditions, 
we now introduce a further assumption on the matrix A. 
As discussed at the end of Section 3.2, we start considering 
a strong condition on the matrix A. More precisely, we 
assume the following: 


Ellipticity condition. There exists a positive constant a, 
independent of the meshsize h, such that 


axl <x'Ax VxeX (125) 
We first notice that from (101) and (125) it follows that 
a< Ma (126) 
We have now the following theorem. 


Theorem 1. Let x y,f,g satisfy the general system of 
equations (95). Moreover, assume that A is symmetric and 
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that the continuity conditions (101), the dual norm assump- 
tions (105), the inf-sup (112) and the ellipticity require- 
ment (125) are all satisfied. Then, we have 


1/2 


1 Ma 

lxi < aliler az P (127) 
2M,” 

llylly < ain Ell + p2 Ma iglo (128) 


Proof, We shall prove the result by splitting x = X; + X, 
and y = yy + Yg defined as the solutions of 


Tie 
Ax, +B yp =f (129) 
Bx, =0 
and 
Ty = 
ees y,=0 (130) 
Bx, = 8 


We proceed in several steps. 

o Step 1 — Estimate of x; and AX, We multiply the first 

equation of (129) to the left by x and we notice that 
x7 Bry y= = y"Bx, = 0 (by the ERE equation). Hence, 


Taen coal 
xpAK, = XE (131) 


and, using the ellipticity condition (125), relation (131), 
and the first of the dual norm estimates (105), we have 


allxlly < FAx; =x TE < ixyllxliflly (132) 

giving immediately 
1 

s= 133 

iIxpllx < zlfle (133) 


and 


1 
xpAxy < zif (134) 


Therefore, using (110) we also get 


mM!” 
|AXpllr S in file (135) 


e Step 2 ~ Estimate of y We use now the inf—sup con- 
dition (112) with y = yp. We obtain that there exists X € 
X such that X XBY = > Bil ly;lly- Multiplying the first 
equation of (129) by x7 X" and using the first of the dual norm 
estimates (105), we have 


Bill lly ply <¥B"y, = X€- Ax) 
< IKllx lf Axslle (136) 


We now use the fact that in the inf-sup condition (112) 
we had ¥ 40, so that in the above equation (136) we 
can simplify by its norm. Then, using (135) and (126), we 
obtain 


1 1 mie 
Iyylly < plf- Axylle < (5 + gag ) le 


2M?” 
< Ting flr 37) 


e Step 3 — Estimate of mat by llyglly We multiply the 
first equation of (130) by x,. Using the second equation 
of (130) and the second of the dual norm estimates (105), 


we have 
xTAx, = —x,By, = y,Bx, = Y8 < ll¥gliyigle (138) 


e Step 4 - Estimate of Iyoy by (x; AX, )1/2 We proceed 
as in Step 2. Using the inf—sup couiaa (112) \ with y = Y; 
we get a new vector, that we call again X, such that 
XBY, > BIlxll¥glly- This relation, the first equation 
of (130), and the continuity property (109) yield 


B'y, = —X KT AX, 
Sree) (139) 


BIxilxllyglly < 


giving 
M” 
ll¥glly S p Ar) (140) 


> Step 5 — Estimate of \|x,\|x and ly,lly We first com- 
bine (138) and (140) to obtain 


l¥elly < p2 Me isle (141) 


Moreover, using the ellipticity assumption (125) in (138) 
and inserting (141), we have 


T 


M, 
allxgl% < x5Ax, < ll¥gllyligile < ge tell (142) 


which can be rewritten as 


M?” 
Ixallx < cap lle (143) 


The final estimate follows then by simply collecting the 
separate estimates (133), (137), (143), and (141). 0 


A straightforward consequence of Theorem 1 and Re- 
mark 4 is the following stability result (cf. (111)): 


Corollary 1. Assume that a numerical method produces a 
sequence of matrices A and B for which both the inf—sup 
condition (112) and the ellipticity condition (125) are satis- 
fied. Then the method is stable. 


Remark 3. In certain applications, it might happen that 
the constants a and $ either depend on h (and tend to zero as 
h tends to zero) or have a fixed value that is however very 
small. It is therefore important to keep track of the possible 
degeneracy of the constants in our estimates when a and/or 
$ are very small. In particular, it is relevant to know whether 
our stability constants degenerate, say, as 1/8, or 1/B*, or 
other powers of 1/B (and, similarly, of 1/c), In this respect, 
we point out that the behavior indicated in (127) and (128) 
is optimal. This means that we cannot hope to find a better 
proof giving a better behavior of the constants in terms of 
powers of 1/a and 1/8. Indeed, consider the system 


2 Ja bj) (x, fi 
Ja a oft {a} 0<ab<l 
b 0 0 y g 


(144) 
whose solution is 
-E zA £ h 8 
Ap 2G aa Tp alte B 
(145) 
Since the constants & and § are given by 
2+a—VSat+4 4a 
4 rd ya 
2 2(2+a+ Va +4) 
and 
p =b 


we see from (145) that there are cases in which the 
actual stability constants behave exactly as predicted by 
the theory. 


Remark 4. We point out that the symmetry condition 
on the matrix A is not necessary. Indeed, with a slightly 
different (and even simpler) proof one can prove stability 
without the symmetry assumption. The dependence of the 
stability constant upon a and ĝ is however worse, as it can 
be seen in the following example. Considering the system 


1 -1 b xy fi 
1 a Ojja =i h 0<a,b&1 
b 0 0 y g 


(146) 
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one easily obtains 


-8 Zh 8 fi ada (1 +a)g 
> e eet aE ay 


Since a =a and B =b, from (147) we deduce that the 
bounds of Theorem 1 cannot hold when A is not symmetric. 


As announced in the title of the section the situation in 
which A is elliptic in the whole space is typical (among 
others) of the Stokes problem, as presented in (22) to (24). 
Indeed, denoting the interpolating functions for u and p by 
N? and N? respectively (cf. (39)), if we set 


(hall =u f [vea aa (148) 
and 
Bip = L IN? 8, |" ag (149) 


we can easily see that conditions (101) are verified with 
M, = 1 and M, = ./(d/) respectively. Clearly the ellip- 
ticity property (125) is also verified with a = 1, no matter 
what is the choice of the mesh and of the interpolating 
functions. On the other hand, the inf—sup Property (112) is 
much less obvious, as we are going to see in Section 4, and 
finite element choices have to be specially tailored in order 
to satisfy it. 


3.5 The inf—sup condition and the lifting 
operator 


In this section, we shall see that the inf—sup condition 
is related to another important property of the matrix B. 
Before proceeding, we recall that an m x n matrix B is 
surjective if for every g € R”, there exists x, € R” such 
that Bx, = g. 

We have the following Proposition. 


Proposition 2. The inf—sup condition (112) is equivalent 
to require the existence of a lifting operator L:g > x, = 
Lg such that, for every g € R”, it holds 


Bx, 150 
{ore ls = Bley < ligilo = Bxl > 


Therefore, in particular, the inf-sup condition implies that 
the matrix B is surjective. 


Proof. We begin by recalling that there exists a symmetric 
(n x n) positive definite matrix M* such that (cf. (102)) 


x'M*x = |x} 451) 
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It is clear that the choice A = M* easily satisfies the first 
of the continuity conditions (101) with M, = 1, as well as 
the ellipticity condition (125) with œ = 1. Given g, if the 
inf-sup condition holds, we can, therefore, use Theorem 
1 and find a unique solution &,, ¥,) of the following 
auxiliary problem: 


x% Ter 
eet f, = 0, 152) 


BK, =g 


We can now use estimate (143) from Step 5 of the proof 
of Theorem 1, recalling that in our case, «a = M, = I since 
we are using the matrix M” instead of A. We obtain 


BIZ, lix < llelle (153) 


It is then clear that setting Lg := X, (the first part of 
the solution of the auxiliary problem (152)) we have that 
estimate (150) in our statement holds true. 

Assume conversely that we have the existence of a 
continuous lifting L satisfying (150). First we recall that 
there exists a symmetric (m x m) positive definite matrix 
M? such that (cf. (103)) 


y'M’y = lyi} (154) 


Then, for a given y € Y, we set first g := M’y (so that 
ye= lly?) and then we define X, = Lg (so that Bx, = 
g). Hence, 


xi By = y"Bx, =y'g = y'My= llyll? (155) 


On the other hand, it is easy to see that using (150) we 
have 


Bixllx < Igle < llylly (156) 


where the last inequality is based on the choice g = M?y 
and the use of (108) with M” in the place of BT, Hence, 
for every y € Y, different from zero, we constructed x = 
x, € X, different from zero, which, joining (155) and (156), 
satisfies 


xiB"y = lly} > Bilxellyllylly (157) 


that is, the inf—sup condition in its original form (112). 0 


3.6 A ‘weak’ condition on the A matrix. 
Ellipticity on the kernel — thermal diffusion 


We now consider that, together with the inf—sup condition 
on B, the condition on A is weaker than the full Elliptic- 
ity (125). In particular, we require the ellipticity of A to 


hold only in a subspace X, of the whole space X, with Xg 
defined as follows: 


X, := Ker(B) = {x€ X such that Bx = 0} (158) 
More precisely, we require the following: 


Elker condition. There exists a positive constant Qy, inde- 
pendent of the meshsize h, such that 

alzi% <x™Ax YxeX (159) 
The above condition is often called elker since it 


requires the ellipticity on the kernel. Moreover, from (101) 
and (159), we get 


dy < M, (160) 


The following theorem generalizes Theorem 1. For the sake 
of completeness, we present here the proof in the case of a 
matrix A that is not necessarily symmetric. 


Theorem 2. Let x €X and y € Y satisfy system (1) and 
assume that the continuity conditions (101), the dual norm 
assumptions (105), the inf-sup (112), and the elker condi- 
tion (159) are satisfied. Then, we have 


1 2M, 
xlix < ~ Hfl; + — 161 
Ixl x z! llr a6 ligile (161) 
2M, M2 
< —4 fle + —* 162 
llylly ab IÊ = ap Isle (162) 


Proof. We first set x, := Lg where L is the lifting operator 
defined by Proposition 2. We also point out the following 
estimates on x,: from the continuity of the lifting L (150), 
we have 


Bix lg < Igle (163) 


and using (106) and (163), we obtain 


M, 
lAx,lle < Mall $ iglo (164) 
Then, we set 
Xo = xX—-xX, =X- Lg (165) 


and we notice that x) € Xp. Moreover, (xp, y) solves the 
linear system 


Tipe. 
fee tee 


We can now proceed as in Steps J and 2 of the proof of 
Theorem 1 (as far as we do not use (110), since we gave 


up the symmetry assumption). We note that our weaker 
assumption elker (159) is sufficient for allowing the first 
step in (132). Proceeding as in the first part of Step 1, and 
using (164) at the end, we get 


1 1 M, 
IXollx < ae oe lle < ds (i; Ne +i) (167) 


This allows to reconstruct the estimate on x: 


1 M, 1 
Ixix = Ixo +xplx < ifle + (= Es 5) isle 
1 2M 
PEI A a. 168 
alle + Sele 069 


where we have used (160) in the last inequality. Combin- 
ing (106) and (168), we also have 


2M2 


aob 


M, 
HAxllp < M,lxllx < Z4 Ifl + Igle (1469) 
0 


which is weaker than (135) since we could not use the 
symmetry assumption. Then, we proceed as in Step 2 to 
obtain, as in (137) 


Bll¥lly < If- Axil; (170) 


and using the above estimate (169) on Ax in (170), we 
obtain 


2 


1 M, 2M, 
Il < (5 + 58) Ie + Es lelo 
7 0 


pb? 


$ 


2M, 2M2 
flle +—+ 171 
ab llfll - OB? iglic (171) 


and the proof is concluded. 


A straightforward consequence of Theorem 2 is the fol- 
lowing stability result (cf. (111)): 


Corollary 2. Assume that a numerical method produces a 
Sequence of matrices A and B for which both the inf—sup 
condition (112) and the elker condition (159) are satisfied. 
Then the method is stable, 


Remark 5. In the spirit of Remark 3, we notice that the 
dependence of the stability constants from ag and 6 is 
optimal, as shown by the previous example (146), for which 
Q = a and B = b. It is interesting to notice that just adding 
the assumption that A is symmetric will not improve the 
bounds. Indeed, considering the system 


11 bl fx, fi 
La Olixmt=jh 0<a,b<1 (172) 
b 0 0 y g 
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one easily obtains 


of petai geht hb Baws 


“b ab 


ab? 

(173) 
Since a =a and $ = b, system (172) shows the same 
behavior as the bounds of Theorem 2 (and not better), even 
though A is symmetric. In order to get back the better 
bounds found in Theorem 1, we have to assume that A, 
on top of satisfying the ellipticity in the kernel (159), is 
symmetric and positive semidefinite in the whole R" (a 
property that the matrix A in (172) does not have for 0 < 
a < 1). This is because, in order to improve the bounds, one 
has to use (110) that, indeed, requires A to be symmetric 
and positive semidefinite. 


As announced in the title of the section, the situation in 
which A is elliptic only in the kernel of B is typical (among 
others) of the mixed formulation of thermal problems, as 
presented in (22) to (24). As in (19), we denote the inter- 
polating functions for @ and q by N? and NÌ, respectively, 
and we set 


nar = f [(sta,) D> (niq))] ag 
2 
+ al |div (N}G,)/? d2 (174) 
ie FSD 
i645 = f IN, Pag (175) 
Q 


where £ represents some characteristic length of the domain 
Q (for instance its diameter) and k, represent some charac- 
teristic value of the thermal conductivity (for instance, its 
average). 

We can easily see that the continuity conditions (101) are 
verified with M, = 1 and M, = £~!./k, respectively. On 
the other hand, the full ellipticity property (125) is verified 
only with a constant « that behaves, in most cases, like 
aa h?, where h is a measure of the mesh size, Indeed, 
the norm of 4 contains the derivatives of the interpolating 
functions, while the term ĝTAĝ does not, as it can be 
seen in (21). On the other hand, we are obliged to add the 
divergence term in the definition (174) of the norm of q: 
otherwise, we cannot have a uniform bound for M, when 
the meshsize goes to zero, precisely for the same reason 
as before. Indeed, the term ê Bq contains the derivatives 
of the interpolating functions NÌ (see (21)), and the first 
part of ||qlly does not. One can object that the constant 
M, does not show up in the stability estimates. It does, 
however, come into play in the error estimates, as we are 
going to see in Section 5. 

It follows from this analysis that, keeping the norms 
as in (174) and (175), the elker property (159) holds, in 
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practical cases, only if the kernel of B is made of free- 
divergence vectors. In that case, we would actually have 
Gp = 1, no matter what is the choice of the mesh and of 
the interpolating functions. 

On the other hand, the inf—sup property (112) is still 
difficult and it depends heavily on the choices of the 
interpolating functions. As we are going to see in the next 
section, the need to satisfy both the elker and the inf-sup 
condition poses serious limitations on the choice of the 
approximations. Apart from some special one-dimensional 
cases, there is no hope that these two properties can hold 
at the same time unless the finite element spaces have 
been designed for that. However, this work has been 
already done and there are several families of finite element 
spaces that can be profitably used for these problems. We 
also note that the elker condition (or, more precisely, the 
requirement that the kernel of B is made only of free- 
divergence vectors) poses some difficulties in the choice of 
the element, but in most applications it constitutes a very 
desirable conservation property for the discrete solutions. 


3.7 Perturbation of the problem — nearly 
incompressible elasticity 


We now consider a possible variant of our general form 
(95). Namely, we assume that we have, together with the 
matrices A and B, a third matrix C, that we assume to be 
an (m x m) matrix, and we consider the general form 


A B']{x f 

ee n 
For simplicity, we assume that the matrix C is given by 
C = cM”, where the matrix M? is attached to the norm 
il- fly as in (103). Clearly, the results will apply, almost 
unchanged, to a symmetric positive definite matrix having 
maximum and minimum eigenvalue of order se. We have 

the following result. * 


Theorem 3. Let x €X and y € Y satisfy the system 
Ax+B'y =f 
P een Cm) 


Assume that A is symmetric and positive semidefinite, and 
that the continuity condition (101), the dual norm assump- 
tions (105), the inf—sup (112) and the elker condition (159) 
are satisfied. Then, we have 


B? + 4eM, 2M”? 
Ixy < =g M + aig Illa (178) 


p 


and 


2M?’ 4M, 
< Eifl t ——4+5 179 
llyly < alg ifle + Me+B lglg (179) 


Proof, The proof can be performed with arguments similar 
to the ones used in the previous stability proofs, but using 
more technicalities. For simplicity, we are going to give 


only a sketch, treating separately the two cases f=0 and © 


g=0. 

e The case f = 0. We set X = L(g + :M’y) and x) =x— 
%. Proceeding exactly as in the proof of Theorem 1 (Step 
4), we obtain inequality (140): 


Mi? 
llyily < on (180) 


Then, we multiply the first equation of (176) times x" and 
substitute the value of y obtained from the second equation. 
We have 


xTAX+ : [M Bx - g| Bx=0 (181) 


Using the fact that x'Ax > 0, we easily deduce that 


IBxllg < Igle (182) 
This implies 
~ 1 1 
zix < pl Bxlle = g!2le (183) 


We now multiply the first equation times xi, and we have 
xt Ax = 0. We can then use (109) to get 


xPAxy = —xPAX < (xf Axy)!/? TAK)? (184) 


Simplifying by (xf Ax)! and using (183), we obtain 
es 5 M, 
xTAx, < "AX < M, IIx < AAG (185) 


Using X= X +X, and then again (109) and (183), we 
obtain 


4M, 


x'Ax < 5 lelg (186) 


that inserted in (180) gives an estimate for y 


2 


Ma Isle a87) 


llylly < 
lylly p 


On the other hand, using the elker condition (159), esti- 
mates (183), (185), and (160) we have 


PA MIP 1 
Ixllx < IXolly + <| t> 
Xx Xolly + Wily (i + :) liglig 


1/2 
ke Ay tal? 1/2 


v G 
ag p abg 


However, we note that using the second equation we might 
have another possible estimate for y: 


1 2 
lylly < z IBx ~ gll < = lgl (189) 


We can combine (187) and (189) into 


iyi < min {7 ae igo 
r Smin jp efile S yeap! 90) 


ə The case g = 0. We set this time X = L(eM”y) and again 
Xo := x — X. From (150), we have as usual 


a | Le 1 
lix < ~Bxllg = p Bxlle (191) 


Multiplying the first equation by xf, we have xJAx = xof 
that gives, using (159) and (109) 


1 
XpAX, < gTa lfl (xp)? + TAx) ATAR 
0 
(192) 
and finally, 


1 a 
FAZ) < Talila + RAR’? 0193) 
0 


In particular, using once more, (109), (193), and (191), we 
obtain 


ies EL 
Ix AR] < -z Ifl RAD"? +TAX 
% 
1/2 
s qp lro +X AK (194) 
0 


Take now the product of the first equation times and 
using y = £71 (M?) !Bx from the second equation, we have 
X'BTy = e'X™ BTM)! Bx = £7! ||Bx||}. Hence, 


1 1 
Wax+ z IBxlG = XES glflelBxlo (195) 
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Using X'Ax = ¥™AX+7Ax,) and the estimate (194) 
in (195), we deduce 


1 1 M!” 
= IBxllG = gl flle IB xil + a fl -IIBxIlg (196) 
ag B 


that finally gives 


1 ae 22M,” 
IBxllg < € G +a | fle < Sips lile 097 
B aig op F 


which is a crucial step in our proof. Indeed, from (197) and 
the second equation, we obtain our estimate for y 


1 2m,” 
llyly < 7 Balle s ag lille (198) 
From (191) and (197), we have 
2 1 His 
IXlx < cIBxllg < Tag llflle (199) 
ay B 


Finally, from (159), (193), and (199), we obtain 
1 1 2eM 
Izol < ETAz)? < (= zre) f 
begs og 
B +2M, 
-Eat 


which together with (199) gives us the estimate for x 


eM? 28M, +B? 4eM, +B? 
x 2 a kakang: i a ae S a 
lxlx £ (ar + ap fle < aob? fle 
(201) 
Collecting (190), (188), (198), and (201), we have the 
result. 0 


lË > (200) 


Remark 6. We notice that the dependence of the stability 
constants upon oy and $ in Theorem 3 are optimal, as shown 
by the system 


2a Ja -fa 0 O]fx% 27 
Ja 2 1 b jjo 0 
-Jā 1 2 0 Bb f\ixt=yo 
0 b 0 -e 0 yı 0 
0 0 b 0 jl» 2g 
O<a,be«1 (202) 


Indeed, we have %) = 2a, 8 = b, and the solution is given 
by 


ees f@ +e) g fe 3ge 


1 abe aap 2 “Gig?” bGe+ by’ 


| 
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EN g(3e + 2b?) 

37 alb? " be +b?) ’ 

sok) 88 ye f 3g 
ap  3e+b? “27T ab 364+b? 


34 Mis 


Remark 7. It is also worth noticing that assuming full 
ellipticity of the matrix A as in (125) (instead of ellipticity 
only in the kernel as we did here) would improve the 
estimates for x. In particular, we could obtain estimates that 
do not degenerate when ĝ goes to zero, as far as e remains 
strictly positive. For the case f = 0, this is immediate from 
the estimate of y (190): from the first equation, we have 
easily 


M,M, 


1 
Ixil < qll < a(Me + 8%) 


igle (203) 
In the case g = 0, we can combine the two equations to get 
x'Ax+elly|lp =x'f (204) 

that gives (always using (125)) 

1 
IXlly < glflle (205) 

that then gives 
1 M, 
< JB eae 

l¥lly < zl xlo $ Ifi e (206) 


This could be combined with (198) into 


. |M, 2M3” 
ty smin | Me, 2 Ifll 


4M2M, 


<—__*+ 4+ — ifi 207 
2M oe + 6M, | lr (207) 


Collecting the two cases we have 


1 MM, 
Ixllx < 5 file + WAM, + BY Ble (208) 
and 
M2 M, 4M 
tyly < r Ill p + 5 lel 
YT oM"ue+al2pm, T Metes 


(209) 
which do not degenerate for $ going to zero. 


As announced in the title of the section, systems of 
the type (176) occur, for instance, in the so-called (u, x) 
formulation of nearly incompressible elasticity. Sometimes 
they are also obtained by penalizing systems of the original 
type (95) in order to obtain a partial cure in cases in which 


8 is zero or tending to zero with the meshsize (as it could 
happen, for instance, for a discretization of Stokes problem 
that does not satisfy the inf—sup condition), in the spirit 
of Remark 7. Indeed, the (u, x) formulation of nearly 
incompressible elasticity, in the case of an isotropic and 
homogeneous body, could be seen, mathematically, as a 
perturbation of the Stokes system with e = 1 /^, and the 
elements to be used are essentially the same. 


3.8 Composite matrices 


In a certain number of applications, one has to deal with 
formulations of mixed type where more than two fields are 
involved. These give rise to matrices that are naturally split 
as 3 x 3 or 4 x 4 (or more) block matrices. For the sake of 
completeness, we show how the previous theory can often 
apply almost immediately to these more general cases. As 
an example, we consider matrices of the type 


A BT 07 [x, f 
B o Cilx tale (210) 
0 Cc 0 Yı 8 


Matrices of the form (210) are found (among several 
other applications) in the discretization of formulations 
of Hu-Washizu type. However, in particular, for elas- 
ticity problems, there are no good examples of finite 
element discretizations of the Hu—Washizu principle that 
satisfy the following two requirements at the same time: 
not reducing more or less immediately (in the lmear 
case) to known discretizations of the minimum poten- 
tial energy or of the Hellinger—Reissner principle, and 
having been proved to be stable and optimally conver- 
gent in a sound mathematical way. Actually, the only 
way, so far, has been using stabilized formulations (see 
for instance Behr, Franca and Tezduyar, 1993) that we 
decided to avoid here. Still, we hope that the follow- 
ing brief discussion could also be useful for the possi- 
ble development of good Hu—Washizu elements in the 
future. 

Coming back to the analysis of (210), we already obser- 
ved that systems of this type can be reconduced to the 
general form (95) 


Is olii 


after making the following simple identifications: 


fa, y=y, (212) 


The stability of system (211) can then be studied using 
the previous analysis. Sometimes it is, however, more 
convenient to reach the compact form (211) with a different 
jdentification: 


slal, y=x, (213) 


Indeed, in this case, the matrix A is much simpler. In 
particular, as it happens quite often in practice, when 
the original matrix A in (210) is symmetric and positive 
semidefinite, the same properties will be shared by A. We 
are not going to repeat the theory of the above sections 
for the extended systems (210), We will just point out the 
meaning of conditions elker and inf—sup, applied to the 
systém (212) to (213), in terms of the original matrices A, 
B, and C. 

The kernel of B, as given in (213), is made of the pairs 
(x1, Y1) such that 


Bx, + Cly, =0 (214) 


These include, in particular, all the pairs (0, y,), where y; 
is in the kernel of CT: 


Ker(C") := {y,| such that CTy, = 0} (215) 


There is no hope that the matrix A, as defined in (213), 
can be elliptic on those pairs. Hence, we must require that 
those pairs are actually reduced to the pair (0, 0), that is, 
we must require that 


Ker(C') = {0} (216) 


This does not settle the matter of elker, since there are many 
other pairs (x,, y,) satisfying (214). As A acts only on the 
X, variables, we must characterize the vectors x, such that 
(X,, yı) satisfies (214) for some y,. These are 


K := [x,| such that z'Bx,=0 VzeKer(C)} (217) 


Hence we have the following result: condition elker will 
hold, for the system (212) to (213) if and only if 


IX > 0 such that Gx, ||? <xfAx, Vx,¢K (218) 


On the other hand, it is not difficult to see that condition 
inf—sup for (212) to (213) reads 


~ T TCT: 
3$ > 0 such that sup ae! id 


2 2) > Blix Vx 
anyi) Ixl + lyi] 2 J 


(219) 
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It is clear that a sufficient condition would be to have 
the inf—sup condition to hold for at least one of the two 
matrices B, CT. In many applications, however, this is 
too strong a requirement. A weaker condition (although 
stronger than (219)) can be written as 


3B > O such that ixl < Cx; + IB% Vx, 
(220) 
More generally, many variations are possible, according to 
the actual structure of the matrices at play. 


4 APPLICATIONS 


In this section, we give several examples of efficient mixed 
finite element methods, focusing our attention mostly on the 
thermal problem (Section 4.1) and on the Stokes equation 
(Section 4.2). For simplicity, we mainly consider triangular 
elements, while we briefly discuss their possible extensions 
to quadrilateral geometries and to three-dimensional cases. 
Regarding Stokes equation, we point out (as already men- 
tioned) that the same discretization spaces can be profitably 
used to treat the nearly incompressible elasticity problem, 
within the context of the (u, x) formulation (80). We also 
address a brief discussion on elements for the elasticity 
problem in the framework of the Hellinger—Reissner prin- 
ciple (Section 4.3). 

We finally remark that, for all the schemes that we 
are going to present, a rigorous stability and convergence 
analysis has been established, even though we will not 
detail the proofs. 


4.1 Thermal diffusion 


We consider the thermal diffusion problem described in 
Section 2.1 in the framework of the Hellinger—Reissner 
variational principle. We recall that the discretization of 
such a problem leads to solve the following algebraic 


system: 
b siei 
where 
Aly =f [N oni] ea, ay =a 
n= [ [v div ( ) | a2, êp =ê, 222) 


al, = | [nto] aa 
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Above, N3 and N? are the interpolation functions for the 
flux q and the temperature @ respectively. Moreover, ĝ and 
ĝ are the vectors of flux and temperature unknowns, while 
i,j=l,...,nandr=1, ...,m, where n and m obviously 
depend on the chosen approximation spaces as well as on 
the mesh. 

Following the notation of the previous section, the norms 
for which the inf-sup and the elker conditions should be 


checked are (cf. (174) and (175)) 
ya = f festa) D-* (N}4,)] ae 
: 
ie [sw atag (223) 
k, Ja 


and 


161% = f N8, Pag (224) 


where £ is some characteristic length of the domain 
Q and k, is some characteristic value of the thermal 
conductivity. 

Before proceeding, we remark the following: 


e Since no derivative operator acts on the interpolat- 
ing functions N? in the matrix B, we are allowed 
to approximate the temperature 9 without requiring 
any continuity across the elements. On the contrary, 
the presence of the divergence operator acting on 
the interpolating functions Nọ in the matrix B sug- 
gests that the normal component of the approxi- 
mated flux should not exhibit jumps between adjacent 
elements, 

e The full ellipticity for A (i.e. property (125)) typi- 
cally holds only with a constant a = h°, once the 
norm (223) has been chosen. However, if a method 
is designed in such a way that 


Go = (GO € Ker(B) implies div (N74?) = 0 
(225) 
the weaker elker condition (159) obviously holds with 
ty = 1. 
Condition (225) is verified if, for instance, we insist 
that 


Span{divN#; i=1,...,2} S 
Span{N°; r =1,...,m)} (226) 
that is, the divergences of all the approximated fluxes 
are contained in the space of the approximated tem- 


peratures. Indeed, condition (226) implies that, for 
every â € Ker(B), there exists 6) = (62y"_, such that 


div (N24?) = —N26°. It follows that 
0 = 6,Bq =— [ (N®62)div N44) a2 
Q 
= Í Idiv N3449)? da (227) 
Q 


so that div (N#gP) = 0. 

Condition (226) can be always achieved by ‘enriching’ 
the temperature approximation, if necessary. However, 
we remark that a careless enlargement of the approxi- 
mated temperatures can compromise the fulfillment of 
the inf-sup condition (112), as shown in the following 
easy result. 


Proposition 3. Suppose that a given method satis- 
fies condition (226). Then the inf—sup condition (112) 
implies 


Span{divN?,; i=1,...,"} 
= Span{N f; r=1,...,m} (228) 


that is, the divergences of all the approximated fluxes 
coincide with the space of the approximated tempera- 
tures. 


Proof. By contradiction, suppose that Span{div N;; 
i=1,...,n} is strictly contained in Span{Np; 
r=1,...,m}. It follows that there exists 
6, = 65”, € R”\{0} such that 


q'B"6, =- Í (N°6+)div (NPG dQ = 0 Y 4, € R" 
: (229) 
Therefore, 


sup = =0 (230) 
aero) lâl 


and the inf-sup condition does not hold (cf. (119)).0 


We also remark that the converse of Proposition 3 
does not hold, that is, condition (228) is not sufficient 
for the fulfillment of inf—sup (although it does imply 
elker). 


From the considerations above, it should be clear that 


degrees of freedom associated with the normal com- 
ponent of the approximated flux are needed to guar- 
antee its continuity across adjacent elements; 

the satisfaction of both the elker and the inf—sup 
condition requires a careful and well-balanced choice 
of the interpolating fields. 


In the following, we are going to present several elements 
designed accordingly to the guidelines above, all satisfying 
property (228). 


4.1.1 Triangular elements 


Throughout this section, we will always suppose that the 
domain Q C R?, on which the thermal problem is posed, 
is decomposed by means of a triangular mesh T, with 
meshsize h, Moreover, we define £, as the set of all the 
edges of the triangles in 7,. 


e The RT, — Po element. We now introduce the simplest 
triangular element proposed for thermal problems. For the 
discretization of the thermal flux q, we take the so-called 
lowest-order Raviart-Thomas element (RT, element), pre- 
sented in Raviart and Thomas (1977); accordingly, the 
approximated flux q is described as a piecewise linear 
(vectorial) field such that 


i. the normal component q* - n is constant on each edge 
e of Ep; 

ii. the normal component q* : n is continuous across each 
edge e of &,. 


To approximate the temperature, we simply use piecewise 
constant functions in each element (P) element). 

On the generic triangle T € 7,, a set of element degrees 
of freedom for q’ is given by its 3 normal fluxes on the 
edges of the triangle, that is, 


fa -nds Y e edge of T (231) 
e 


Therefore, the space for the element approximation of q 
has dimension 3 and a basis is obtained by considering the 
(vectorial) shape functions 


a Nf 2 1 AN z 
MEM. = m [57 k=1,2,3 
(232) 
Above, {x;,,9,}" denotes the position vector of the kth 
vertex (local numbering) of the triangle T. 
We also remark that, because of (232), q? can be locally 
described by 


i x|_ a 
= Po + = 233 
1 = rfs] aay aa) 


where ag, bo, Po € R. 
As far as the approximated temperature is concerned, an 
element basis for 6” is given by the shape function 


N? = N°, y) = 1 (234) 
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Figure 1. Degrees of freedom for R To- Po element. 


The element degrees of freedom for both q” and 6* are 
schematically depicted in Figure 1. 


e The RT, — P, family, We now present the extension 
to higher orders of the RZ) — Py method just described 
(cf. Raviart and Thomas, 1977). Given an integer k > 1 
and using the definition introduced in Nedelec (1980), for 
the flux q”, we take a field such that (RT, element) on each 
triangle T of J, we have 


q =p, (x, y) + Pg, y) p ) (235) 


where p, (x, y) (resp. p(x, y)) is a vectorial (resp. scalar) 
polynomial of degree at most k. Moreover, we require that 
the normal component q” - nis continuous across each edge 
e of &,. This can be achieved by selecting the following 
element degrees of freedom: 


i. the moments of order up to k of q' - n on the edges of 
T; 
ii. the moments of order up to k — 1 of q* on T. 


For the discretized temperature 6", we take piecewise 
polynomials of degree at most k (P, element). 

The element degrees of freedom for the choice k = 1 are 
shown in Figure 2. 


e The BDM, — Py element. Another method, widely used 
to treat the thermal diffusion problem, arises from the 
approximation of the flux q by means of the so-called 


8 


i 


Figure 2. Degrees of freedom for RT; ~P element. 
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lowest-order Brezzi-Douglas—Marini element (BDM, ele- 
ment), proposed and analyzed in Brezzi et al. (1985). It 
consists in discretizing q by vectorial functions q” such 
that 


i, qi is linear in triangle T of Tp; 
ii. the normal component q* - n is continuous across each 


edge e of Ep. 


For the approximated temperature 6", we again use 
piecewise constant functions on each triangle. 

Focusing on the generic triangle T € 7,, we remark that 
the approximation space for q has dimension 6, since full 
linear polynomials are employed. A suitable set of element 
degrees of freedom is provided by the moments up to order 
1 of the normal fluxes q*-n across each edge e of T, 
explicitly given by the values 


fanas 


(236) 
[ sq’ -nds 
e 


where s is a local coordinate on e ranging from —1 to 1. 
The element degrees of freedom for the resulting method 
are shown in Figure 3. 


è The BDM,,, — P, family. As for the RTy—Py scheme, 
also the BDM, — P} finite element method is the lowest order 
representative of a whole class. Indeed, given an integer 
k > 1, we can select the approximations presented in Brezzi 
et al. (1985). 

For the discretized flux q", the normal component q’ - n 
is continuous across each edge e of €,. Moreover, q’ is 
a vectorial polynomial of degree at most k +1 on each 
triangle T of 7, (BDM,,,, element). Also, in this case, the 
continuity of the normal component can be obtained by a 
proper choice of the degrees of freedom. 

For the approximated temperature 0”, we use the discon- 
tinuous P, element. Figure 4 shows the element degrees of 
freedom for the case k = 1. 


1 | 


Figure 3. Degrees of freedom for BDM; -— Pp element. 
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Figure 4. Degrees of freedom for BDM2-P; element, 


4.1.2 Quadrilateral elements 


We now briefly consider the extension of some of the 
methods presented in the previous section to quadrilateral 
meshes. In this case, we define our approximating spaces 
on the reference element K = [—1, 1? equipped with local 
coordinates (£, n). As far as the flux is concerned, the cor- 
responding approximation space on each physical element 
K must be obtained through the use of a suitable transfor- 
mation that preserves the normal component of vectorial 
functions. This is accomplished by the following (con- 
travariant) Piola’s transformation of vector fields. Suppose 
that 
F.K — K; (Œ, y) = FE, n) 

is an invertible map from K onto K, with Jacobian matrix 
JG, n). Given a vector field q = q(E, n) on K, its Piola’s 
transform P(q) = P(q)(x, y) is the vector field on K, 
defined by 


1 
PQ), y) := Tep Ë maë, n); (x,y) = FE, n) 


where J(&, n) = | det JC. n)|. Therefore, if 
Q(K) = Spang}; i= 1,...,n,1} 
is an n-dimensional flux approximation space defined on 


the reference element K, the corresponding space on the 
physical element K will be 


Q(K) = Span{P(qi); i =1,...,2.} 


The RT jg, — Po element. In the reference element R, we 
prescribe the approximated flux q’ as (RTjo element) 


sea ate 


q= Sra) a,b,c, dER (237) 
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Figure 5. Degrees of freedom for R7jo)—Po element. 


Because of (237), it is easily seen that the four values 
fa -nds V e edge of K (238) 
e 


can be chosen as a set of degrees of freedom. More- 
over, div qř is constant in K , suggesting the choice of a 
constant approximated temperature 6" in K (Py element). 
The degrees of freedom for both q and 6" are shown in 
Figure 5. 


o The BDM,,, ~ Po element. For the discrete flux q” on K, 
we take a field such that (BDM,,, element) 


2 2 
a’ = p60) tal 5, jefa] 


= p Œ, M +.4(V (En)? +b(VEN?))* (239) 


Above, p; (, n) is a vectorial linear polynomial, and a, b 
are real numbers. This space is carefully designed in order 
to have 


i. q’-n linear on each edge e of Š. 
ii divq’ constant in K. 


Again, for the approximated temperature 6", we take 
constant functions (P) element). The element degrees of 
freedom for both q” and 6* are shown in Figure 6. 


4.1.3. Three-dimensional elements 


All the elements presented above have their three-dimen- 
sional counterpart. In this case, the normal component of 
the approximated flux q* should not exhibit jumps across 
faces of adjacent elements. 

In Figure 7(a), we display the tetrahedral version of the 
RT) — Py) element (cf. Figure 1), consisting of a piece- 
wise constant approximation for the temperature 6 and 
of the following element approximating functions for q 


q 8 


mi 


Figure 6. Degrees of freedom for BDMpy- Po element. 


as <> 


Fignre 7. 3-D elements for the thermal problem. 
(see Nedelec, 1980): 
x Ay + Pox 
a’ lr = Po + Po} Y f = 4 bo + Poy 


z Co + Po? 
4p, bo» Co: Po ER (240) 


Therefore, in each tetrahedron T, the space for the approxi- 
mated flux has dimension 4 and the degrees of freedom are 
precisely the values f; q’ - ndo on each tetrahedron face f. 
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The three-dimensional version of the BDM, — Po ele- 
ment (cf. Figure 3) is shown in Figure 7(b). The approx- 
imated temperature is still piecewise constant, while the 
discretized flux qlr is a fully linear vectorial function. 

We also present the extension of the RTjg) — Po to the 
case of cubic geometry, as depicted in Figure 1c). 


Remark 8. We conclude our discussion on the thermal 
problem by noticing the obvious fact that the linear sys- 
tem (221) has an indefinite matrix, independent of the 
chosen approximation spaces. This is a serious source 
of trouble. For the discretizations considered above, we 
can however overcome this drawback. Following Fraeijs 
de Veubeke (1965), one can first work with fluxes that 
are totally discontinuous, forcing back the continuity of 
the normal components by means of suitable interelement 
Lagrange multipliers, whose physical meaning comes out 
to be ‘generalized temperatures’ (for instance, approxima- 
tions of the temperature mean value on each edge). As 
the fluxes are now discontinuous, it is possible to elimi- 
nate them by static condensation at the element level. This 
will give a system involving only the temperatures and the 
interelement multipliers. At this point, however, it becomes 
possible to eliminate the temperatures as well (always at the 
element level), leaving a final system that involves only the 
multipliers, This final system has a symmetric and positive 
definite matrix, a very useful property from the computa- 
tional point of view. For a detailed discussion about these 
ideas, we refer to Arnold and Brezzi (1985), Marini (1985), 
and Brezzi etal. (1986, 1987, 1988). For another way 
to eliminate the flux variables (although with some geo- 
metrical restrictions) see also Baranger, Maitre and Oudin 
(1996). For yet another procedure to reduce the number of 
unknowns in (221) and getting a symmetric positive definite 
matrix, see Alotto and Perugia (1999). 


4.2 Stokes equation 


As detailed in Section 2.2, the discretization of the Stokes 
problem leads to solving the following algebraic system: 


b sli. 7 


n= [N] dQ 


Above, NY and N? are the interpolation functions for the 
velocity u and the pressure p respectively. Also, & and 
p are the vectors containing the velocity and the pressure 
unknowns. In the sequel, we will always consider the 
case of homogeneous boundary conditions for the velocity 
field along the whole boundary 0&2. As a consequence, 
the pressure field is determined only up to a constant. 
Uniqueness can, however, be recovered, for instance, by 
insisting that the pressure has zero mean value over the 
domain & or by fixing its value at a given point. 

We also remark that, since there is no derivative of 
N? in the definition of the matrix B, both continuous and 
discontinuous pressure approximations can be chosen. On 
the contrary, the symmetric gradients of N} entering in the 
matrix A suggest that the approximated velocities should 
be continuous across adjacent elements. 

If we introduce the norms 


a = p [ [venta |? da (243) 


and 


Ill} := [ [NP B,|’ de (244) 


the continuity conditions (101) and the ellipticity condi- 
tion (125) of the previous section are clearly satisfied, 
namely, with M, = 1, M, = J/(d/), and a= 1. There- 
fore, a stable method is achieved provided the only inf—sup 
condition (112) is fulfilled. 


4.2.1 Triangular elements with continuous pressure 
interpolation 


In this section, we describe some stable triangular element 
for which the pressure field is interpolated by means of 
continuous functions. 

e The MINI element. Given a triangular mesh 7,,, of 8, for 


the approximated velocity u" we require that (cf. Arnold, 


Brezzi and Fortin, 1984) 


i. for each T € Tp, the two components of u* are the 
sum of a linear function plus a standard cubic bubble 
function; 

ii. the two components of uè are globally continuous 
functions on Q. 


Concerning the discretized pressure pt, we simply take 
piecewise linear and globally continuous functions. 

For the generic element T € T,, the elemental degrees 
of freedom for u? are its (vectorial) values at the triangle 
vertexes and barycenter. A basis for the element approxi- 
mation space of each component of u? can be obtained by 


TAVAN 


Figure 8. Degrees of freedom for MINI element. 


considering the following four shape functions: 


N, =N, y) = k=1,2,3 
| N, = Nœ, y) = Pars or 
where {X, = M(x, y), k = 1,2,3} denote the usual area 
coordinates on T. 

Furthermore, a set of elemental degrees of freedom for 
p" is given by its values at the triangle vertexes, while the 
three shape functions to be used are obviously 


Ny =p k=1,2,3 (246) 


The element degrees of freedom for both u* and p* 
are schematically depicted in Figure 8. We finally remark 
that the bubble functions for the velocity are internal 
modes, so that they can be eliminated on the element level 
by means of the so-called static condensation procedure 
(cf. Hughes, 1987, for instance), As a consequence, these 
additional degrees of freedom do not significantly increase 
the computational costs. 


e The Hood-Taylor elements, These elements arise from 
the experimental evidence that using a velocity approxi- 
mation of one degree higher than the approximation for 
pressure gave reliable results (cf. Hood and Taylor, 1973). 
We are therefore led to consider, for each integer k with 
k > 1, the following interpolation fields. 

The approximated velocity u* is such that 


i. for each T € 7,, the two components of u” are poly- 
nomials of degree at most k + 1; 

i. the two components of u” are globally continuous 

functions on Q. 


For the approximated pressure p”, we ask that 
i. for each T €7,, p" is a polynomial of degree at 


most k; 
ii. p* is a globally continuous function on Q. 


Figure 9 shows the u? and p* element degrees of freedom, 
for the lowest-order Hood—Taylor method (i.e. k = 1). 
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Figure 9. Degrees of freedom for the lowest-order Hood—Taylor 

element. 


Remark 9. A first theoretical analysis of the lowest-order 
Hood-Taylor method (k = 1) was developed in Bercovier 
and Pironneau (1977), later improved in Verfiirth (1984). 
The case k =2 was treated in Brezzi and Falk (1991), 
while an analysis covering every choice of k was presented 
in Boffi (1994). We also remark that the discontinuous 
pressure version of the Hood-Taylor element typically 
results in an unstable method. However, stability can be 
recovered by imposing certain restrictions on the mesh for 
k > 3 (see Vogelius, 1983; Scott and Vogelius, 1985), or 
by taking advantage of suitable stabilization procedures for 
k > 1 (see Mansfield, 1982; Boffi, 1995). 
e The (P,-iso-P,) — Pf element. This is a ‘composite’ ele- 
ment whose main advantage is the shape function sim- 
plicity. We start by considering a triangular mesh 7, with 
meshsize h. From 7,, we build another finer mesh 7,,,. by 
splitting each triangle T of 7, into four triangles using the 
edge midpoints of T, as sketched in Figure 10. 

The approximated velocity w is now defined using the 
finer mesh 7,/ according to the following prescriptions: 


i. for each triangle of T,,,., the two components of uf are 
linear functions; 

ii. the two components of w" are globally continuous 
functions on &. 


On the other hand, the interpolated pressure p* is piece- 
wise linear in the coarser mesh 7,, and globally continuous 
on Q. 

For every triangle T’ of finer mesh 7,;2, the degrees 
of freedom of u? are its values at the vertexes, while 
an element basis is given by taking the shape functions 
Ny = hy (k = 1, 2, 3) relative to T”. 


Figure 10. Splitting of a triangle T € 7. 
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Figure 11. Degrees of freedom for (P\-iso-P,) — Pf element. 


Instead, by considering the generic triangle T of the 
coarser mesh 7,, the point values at the three vertexes 
provide a set of degrees of freedom for p”, Therefore, the 
shape functions N, = , (k = 1, 2, 3), relative to T, can be 
chosen as a basis for the element pressure approximation. 

The element degrees of freedom for both u* and p* are 
schematically depicted in Figure 11. 


Remark 10. A popular way to solve system (241) con- 
sists in using a penalty method. More precisely, instead 
of (241), one considers the perturbed system 


A BT Tfa f 
[s Zelf f] e 
where the ‘mass’ matrix C is defined by 
Cls = f [NP NP] dQ (248) 
2 


and o > 0 is a ‘small’ parameter. In the case of discontinu- 
ous pressure approximations, the pressure unknowns can be 
eliminated from (247) on the element level, leading there- 
fore to the following system for û: 


(A+o'B'CB)a=f (249) 


with C ‘easy-to-invert’ (namely, block diagonal), When 
continuous pressure approximations are considered, the 
inverse of C is in general a full matrix, so that the elimi- 
nation of the pressure unknowns seems impossible on the 
element level. We have, however, the right to choose a 
different penalizing term in (247): for instance, we could 
replace C by a diagonal matrix Č, obtained from C by a 
suitable mass lumping procedure (cf. Hughes, 1987). The 
Pressure elimination becomes now easy to perform, leading 
to (cf. (249)) 


(Ato B'C'B)a=e (250) 


A drawback of this approach, however, not so serious 
for low-order schemes, stands in a larger bandwidth for 
the matrix (A + o7'BTČ-!B). For more details about this 
strategy, we refer to Arnold, Brezzi and Fortin (1984). 


4.2.2 Triangular elements with discontinuous 
Pressure interpolation 


In this section, we describe some stable triangular element 
for which the pressure field is interpolated by means of 
discontinuous functions. It is worth noticing that all these 
elements have velocity degrees of freedom associated with 
the element edges. This feature is indeed of great help in 
proving the inf—sup condition for elements with discontin- 
uous pressure interpolation (cf. Remark 16). 


è The Crouzeix—Raviart element. Our first example of dis- 
continuous pressure elements is the one proposed and 
analyzed in Crouziex and Raviart (1973). It consists in 
choosing the approximated velocity u” such that 


i. foreach T € 7,, the two components of u* are the sum 
of a quadratic function plus a standard cubic bubble 
function; 

ii, the two components of u” are globally continuous 
functions on Q. 


Moreover, for the discretized pressure p”, we simply 
take the piecewise linear functions, without requiring any 
continuity between adjacent elements. 

The elemental approximation of each component of u? 
can be described by means of the following seven shape 
functions 


N, = hy e012 3 
N, = Ahad Ns = 4423, Ng =4h4d. (251) 
N, = 27h pg 


The degrees of freedom are the values at the triangle 
vertexes and edge midpoints, together with the value at 
the barycenter. 

Concerning the pressure approximation in the generic 
triangle T, we take the three shape functions 


N,=1 
N,=x (252) 
N3=y 


and the degrees of freedom can be chosen as the values at 
three internal and noncollinear points of the triangle, 

Figure 12 displays the element degrees of freedom for 
both u* and př. 


The P,,,. — P, family. We now present a class of mixed 
methods consisting in choosing, for any integer k with 
k > 0, the following interpolation fields. 

For the approximated velocity u’, we require that 


i. for each T € 7, the two components of u! are poly- 
nomials of degree at most k + 2; 
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Figure 12. Degrees of freedom for Crouzeix—Raviart element. 


h 


ii. the two components of u* are globally continuous 


functions on Q. 


Instead, the approximated pressure p” is a polynomial 
of degree at most k for each T € J,, with no continuity 
imposed across the triangles. 

Figure 13 shows the local degrees of freedom, for both 
u’ and př, of the lowest-order method (i.e. k = 0), which 
has been proposed and mathematically analyzed in Fortin 
(1975). 


Remark 11. For the P,,, — P, family, the discretization 
error in energy norm is of order h*+! for both the veloc- 
ity and the pressure, even though the P,, >-approximation 
should suggest an order h*+? for the velocity field. This 
‘suboptimality’ is indeed a consequence of the poor pres- 
sure interpolation (polynomials of degrees at most k). How- 
ever, taking advantage of a suitable augmented Lagrangian 
formulation, the P,,,—P, family can be improved to 
obtain a convergence rate of order h*+3/? for the velocity, 
without significantly increasing the computational costs, We 
refer to Boffi and Lovadina (1997) for details on such an 
approach, 


© The (P,-iso-P,) — Py element. Another stable element 
can be designed by taking the P,-iso-P, element for 
the approximated velocity, and a piecewise constant 
approximation for the pressure. More precisely, as for 
the (P,-iso-P,)~ Pf element previously described, we 
consider a triangular mesh 7, with meshsize h. We then 
build a finer mesh 7, j according to the procedure sketched 
in Figure 10. 


Figure 13. Degrees of freedom for Pz- Py element. 


u P 
Figure 14. Degrees of freedom for (P, -iso- P>) — Py element. 


We recall that the approximated velocity u* is given 
using the finer mesh 7, /2 and requiring that 


i. for each triangle of T; z, the two components of u” are 
linear functions; 

ii. the two components of u* are globally continuous 
functions on Q. 


Instead, the pressure approximation is defined on the 
coarser mesh 7, by selecting the piecewise constant func- 
tions. 

The local degrees of freedom for both u* and p* are 
shown in Figure 14. 


© The non-conforming PNC — P, element. We present an 
element, attributable to Crouziex and Raviart (1973), for 
which the approximated velocity u* is obtained by requiring 
that 


i. for each triangle the two components of u? are linear 
functions; 

ii, continuity of uw? across adjacent elements is imposed 
only at edge midpoints. 


For the approximated pressure př, we simply take the 
piecewise constant functions. 

Given a triangle T € 7,, the degrees of freedom for 
the approximating velocity u’ are the values at the three 
edge midpoints. Furthermore, for each component of u*, 
an element basis on triangle T is provided by 

N,=1-2r, k=1,2,3 

The lack of continuity for the discrete velocity implies 
that the differential operators (gradient and divergence) 
acting on u” should be taken element-wise. For instance, 
the matrix B should be written as 


BL;=- > f 


[Npaiv (N) Jae 253) 

Ten? 
The degrees of freedom are displayed in Figure 15. We 
remark that applicability of the PNC — P) element is limited 
to problems with Dirichlet boundary conditions for the 
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Figure 15. Degrees of freedom for PN° — Pp element. 


displacement field imposed on the whole 9&2, For other 
situations (e.g. a pure traction problem), the scheme exhibits 
spurious mechanisms, because of its inability to control 
the rigid body rotations (cf. Hughes, 1987). Two stable 
modifications of the PNC — P, element have been proposed 
and analyzed in Falk (1991) and, recently, in Hansbo and 
Larson (2003). 


4.2.3 Quadrilateral elements 


Many of the triangular elements presented above have their 
quadrilateral counterpart. As an example, we here show 
the so-called Q, — Q$ element, which is the quadrilat- 
eral version of the lowest-order Hood—Taylor element (cf. 
Figure 9). Accordingly, the velocity is approximated by 
biquadratic and continuous functions, while the pressure is 
discretized by means of bilinear and continuous functions, 
as depicted in Figure 16. 

Another very popular scheme is the Q, — P, element, 
based on the same approximated velocities as before. 
Instead, the interpolating functions for the pressure are 
piecewise linear, without requiring any continuity across 
the elements. The local degrees of freedom are displayed 
in Figure 17. 


4.2.4 Three-dimensional elements 


Several elements previously described extend to the 
case of three-dimensional problems. In Figure 18(a), we 
show a continuous pressure tetrahedral element, which is 
nothing but the 3-D version of the MINI element (cf. 


g 


e 
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Figure 16. Degrees of freedom for Q2 — Qj element. 
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Figure 17. Degrees of freedom for Q2 — P, element. 


Figure 8). Also, the non-conforming PNC — P) element 
(cf. Figure 15) has its three-dimensional counterpart, as 
depicted in Figure 18(b). We remark that the degrees of 
freedom for the velocity are given by the values at the 
barycenter of each tetrahedron face. Figure 18(c) shows an 
example of cubic element, which is exactly the 3-D version 
of the popular Q, — P, element (cf. Figure 17). 

Finally, we refer to Stenberg (1987) for the analysis, 
based on the so-called macroelement technique introduced 
in Stenberg (1984), of the lowest-order 3-D Hood—Taylor 
method, and to Boffi (1997) for the higher-order case. 


4.2.5 Stabilized formulations 


From the above discussion, it should be clear that the fulfill- 
ment of the inf—sup condition requires a careful choice of 
the discretization spaces for the velocity and the pressure. 
A first strategy to obtain stability has been to derive the 


Figure 18. 3-D Stokes elements. 


numerical scheme from a perturbation of functional (37), 
by considering (see Brezzi and Pitkäranta, 1984) 


Čo, p) = 5. f [Vu : Vu) a2- f b-a] dQ 


- f [paiva d2-5 E f rzlveP as (254) 


Ket, 


where a is a positive parameter and hx is the diameter of 
the element K € T,. The ‘perturbation term’ 


a 
oe [ valve? an 


Kelh 


has a stabilizing effect on the discretized Euler—Lagrange 
equations emanating from (254). It however introduces a 
consistency error, so that the convergence rate in energy 
norm cannot be better than O(h), even though higher- 
order elements are used. Following the ideas in Hughes and 
Franca (1987) and Hughes et al. (1986), this drawback may 
be overcome by means of a suitable augmented Lagrangian 
formulation. Instead of considering (37) or (254), one can 
introduce the augmented functional 


L(y, n= 5 f [Vu : Vu] dQ 
2 Ja 
- [ ww ag- f [paiva] ag 
Q Q 


1 2 
-3È fo han-vp + dQ (255) 
ETh 


where, for each element K € 7,, a(K) is a positive 
parameter at our disposal. Because of the structure of 
the ‘additional term’ in (255), both the functionals (37) 
and (255) have the same critical point, that is, the solu- 
tion of the Stokes problem. Therefore, the discretized 
Euler-Lagrange equations associated with (255) deliver a 
consistent method, whenever conforming approximations 
have been selected. As before, the augmented term may 
have a stabilizing effect, allowing the choice of a wider 
class of elements. For instance, if 


a(K) = ah, 


where & is sufficiently ‘small’, any finite element approx- 
imation of velocity and pressure (as far as the pressure is 
discretized with continuous finite elements) leads to a sta- 
ble scheme, with respect to an appropriate norm (see Franca 
and Hughes, 1988). 

This approach has several interesting variants. Indeed, 
considering the Euler-Lagrange equations associated 
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with (255) we have 


u f vu: vva- f -v a- f [pdv] dQ 


- f [a aivo] dQ — £ fw [uAu- Vp +b] 


Ket, 
-[WAv — Vg] dQ = 0 (256) 


for all test functions u and q. The term in second line 
of (256) represents our consistent perturbation. A careful 
analysis can show that its stabilizing effect still works if 
we change it into 


+ DE Í a(K) [wAu—Vp+b]-[pAv+ Yg] da 
Ket, K 

k (257) 
(that is, changing the sign of the whole term, but changing 


also the sign of Vq in the second factor) or simply into 


+> / a(K) [Au Vp +b]: Vgd2 (258) 
Keh, K 


For a general analysis of these possible variants, we refer 
to Baiocchi and Brezzi (1993). A deeper analysis shows 
that, in particular, the formulation (257) can be interpreted 
as changing the space of velocities with the addition of 
suitable bubble functions and then eliminate them by static 
condensation. This was pointed out first in Pierre (1989), 
and then in a fully systematic way in Baiocchi, Brezzi and 
Franca (1993). 

Other possible stabilizations can be obtained by adding 
penalty terms that penalize the jumps in the pressure vari- 
able over suitable macroelements. See, for instance, Sil- 
vester and Kechar (1990). This as well can be seen as 
adding suitable bubbles on the macroelements and elimi- 
nating them by static condensation. 

For a more general survey of these and other types 
of stabilizations, see Brezzi and Fortin (2001) and the 
references therein. 

Another approach to get stable elements is based on the 
so-called Enhanced Strain Technique, introduced in Simo 
and Rifai (1990) in the context of elasticity problems. As 
already mentioned in Section 2.3, the basic idea consists 
in enriching the symmetric gradients V‘u' with additional 
local modes. An analysis of this strategy for displacement- 
based elements has been developed in Reddy and Simo 
(1995) and Braess (1998). Within the framework of the 
(u, x) formulation for incompressible elasticity problems 
(and therefore for the Stokes problem), the enhanced strain 
technique has been successfully used in Pantuso and Bathe 
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(1995) (see also Lovadina, 1997 for a stability and con- 
vergence analysis), and more recently in Lovadina and 
Auricchio (2003) and Auricchio et al. (submitted). 


4.3 Elasticity 
We now briefly consider the elasticity problem in the frame- 
work of the Hellinger—Reissner variational principle (59). 


We recall that after discretization we are led to solve a 
problem of the following type (cf. (57), (58), and (61)): 


EIA 


Aly = [ [N:o] dQ, êl =ô; 
B= f [Ne div (Ny) | aQ, âp =â, (260) 


g, =- f [Ntb] an 


where 


Above, N7 and N? are the interpolation functions for the 
stress ø and the displacement u respectively. Moreover, 6 
and û are the vectors of stress and displacement unknowns. 
We note that since the divergence operator acts on the 
shape functions N? (see the B matrix in (260)), the approx- 
imated normal stress o’n should be continuous across 
adjacent elements. On the contrary, no derivative opera- 
tor acts on the shape functions NV, so that we are allowed 
to use discontinuous approximation for the displacement 
field. Analogously to the thermal diffusion problem (see 
Section 4.1), the proper norms for 6 and û' are as follows 
(cf. (223) and (224)): 


We = L [ (N?4,) :D` (Nyé,)] dQ 
g2 
pe [ idiv N76)? d2 261) 
D, Je 


and 


lâi? := a Nea, da (262) 


where £ is some characteristic length of the domain 2 and 
D, is some characteristic value of the elastic tensor. 
Despite the apparent similarity with the correspond- 
ing (221) to (222) of the thermal diffusion problem, finding 
approximation spaces for (259) to (260), which satisfy both 
the inf—sup and the elker conditions, is much more difficult 
(see e.g. Brezzi and Fortin, 1991 for a discussion on such 


fea 


Figure 19. Splitting of a generic triangle for the Johnson—Mercier 
element. 


a point). Here below, we present two triangular elements 
proposed and analyzed in Johnson and Mercier (1978) and 
Arnold and Winther (2002) respectively. 

e The Johnson—Mercier element. This method takes advan- 
tage of a ‘composite’ approximation for the stress field. 
More precisely, we first split every triangle T € 7, into 
three subtriangles T, (i = 1, 2,3) using the barycenter of 
T (see Figure 19), 

For the approximated stress o”, we then require that 


i. in each subtriangle T, the components of ø} are linear 
functions; 

ii. the normal stress øřn is continuous across adjacent 
triangles and across adjacent subtriangles. 


Accordingly, the discrete stress o* is not a polynomial 
on T, but only on the subtriangles T;. For the generic ele- 
ment T € 7,, it can be shown (see Johnson and Mercier, 
1978) that the elemental degrees of freedom are the follow- 
ing. 


i. On the three edges of T: the moments of order 0 and 
1 for the vector field on (12 degrees of freedom); 

ii. On T: the moments of order 0 for the symmetric tensor 
field o” (3 degrees of freedom). 


Moreover, each component of the approximated displace- 
ment w* is chosen as a piecewise constant function. 

Figure 20 displays the element degrees of freedom for 
both o” and u*. 


p---------> a 


Figure 20. Degrees of freedom for the Johnson—Mercier element. 


e The Arnold Winther element. This triangular element has 
been recently proposed and analyzed in Arnold and Winther 
(2002), where higher-order schemes are also considered. 
For the approximated stress o”, we impose that 


i. on each T € 7%, o” is a symmetric tensor whose 
components are cubic functions, but divo® is a linear 
vector field; 

ii. the normal stress o 
triangles. 


’n is continuous across adjacent 


For each element T € 7, the approximation space for the 
stress field has dimension 24 and the elemental degrees of 
freedom can be chosen as follows (see Arnold and Winther, 
2002): 


i. the values of the symmetric tensor field o” at the 
* vertices of T (9 degrees of freedom); 
ii: the moments of order 0 and 1 for the vector field o*n 
on each edge of T (12 degrees of freedom); 
iii, the moment of order 0 for ø” on T (3 degrees of 
freedom). 


Furthermore, the components of the approximated dis- 
placement u* are piecewise linear functions, without requir- 
ing any continuity across adjacent elements. 

In Figure 21, the element degrees of freedom for both 
o' and uř are schematically depicted. 


Remark 12. Other methods exploiting ‘composite’ app- 
roximations as for the Johnson—Mercier element have 
been proposed and analyzed in Arnold, Douglas and Gupta 
(1984). 

Following the ideas in Fraeijs de Veubeke (1975), a 
different strategy to obtain reliable schemes for the elas- 
ticity problem in the context of the Hellinger—Reissner 
variational principle consists in the use of unsymmetric 
approximated stresses. Symmetry is then enforced back in 
a weak form by the introduction of a suitable Lagrange 
multiplier. We refer to Amara and Thomas (1979), Arnold, 
Brezzi and Douglas (1984), Brezzi et al. (1986), and Sten- 
berg (1988) for the details on such an approach. 


o u 


1 4 


Figure 21. Degrees of freedom for the Arnold—Winther element. 
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5 TECHNIQUES FOR PROVING THE 
INF-SUP CONDITION 


In this section we give some hints on how to prove the 
inf-sup condition (118). We also show how the stability 
results detailed in Section 3 can be exploited to obtain 
error estimates. We focus on the Stokes problem, as a 
representative example, but analogous strategies can be 
applied to analyze most of the methods considered in 
Section 4. 

We begin recalling (cf. (38)) that a weak form of the 
Stokes problem with homogeneous boundary conditions for 
the velocity consists in finding (u, p) such that 


vf [(V8u) : Vu] dQ -f [div @u) p] da 
Q Q 
= f, [Bu - b] dQ (263) 
Í [èp divu] d2 = 0 
2 


for any admissible velocity variation ŝu and any admissible 
pressure variation §p. On the other hand, as detailed in 
Section 4.2, the discretized problem consists in solving 


[s olis}-fo} 2% 


Aly = v f [ent :VN}] ag, al, =a, 


where 


Bij = - f [ne aiv (Ne) | ae, pi =ô, 265) 


n= ft lan 


with i j = 1,... n andr =1,...,m. 

With our notation for the Stokes problem, the inf—sup 
condition in its equivalent form (119) consists in requiring 
the existence of a positive constant $, independent of h, 
such that 


aTBTG on 
=e 4s Biaiy (266) 


VqeY su — 
zex\(oy llZlx 


where X = R” and Y = R”. 
Moreover, in what follows, we need to introduce the 
space X for vectorial functions v, defined by 


X= fv : vlag = 0, Ivl} = vf [Vv]? dR < +00} 
(267) 
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and the space Y for scalar functions q, defined by 


y= {a Nl = i lq? d2 < +oo} (268) 


Remark 13. It is worth noticing that, whenever an 
approximated velocity u = N}a; is considered, the follow- 
ing holds (cf. (243)) 


1/2 
ally = (u if Ivt Pa) = luly (269 


Therefore, the X-norm we have tailored for vector û € X 
coincides with the X-norm of the reconstructed function u". 
Similarly (cf. (244)), if p" = N? B,, we have 


1/2 
i= (f INep, Rag) = Ip" lly (270) 
5.1 Checking the inf—sup condition 


As already mentioned, a rigorous proof of the inf—sup 
condition is typically a difficult task, mainly because several 
technical mathematical problems have to be overcome. 
In this section, we present two of the most powerful 
tools for proving the inf—sup property. The first technique 
(Fortin’s trick) can be used, for instance, to study the 
stability of the PXC — P) element (cf. Figure 15) and the 
Crouzeix—Raviart element (cf. Figure 12), as we are going 
to detail below. The second one (Verfiirth’s trick) can be 
applied basically to all the approximations with continuous 
pressure and it will be exemplified by considering the MINI 
element (cf. Figure 8). 

Although we are aware that the subsequent analysis is 
not completely satisfactory from the mathematical point 
of view, it nonetheless highlights some of the basic ideas 
behind the analysis of mixed finite element methods. 

We first need to recall the following important theorem 
of functional analysis (see Ladyzhenskaya, 1969; Temam, 
1977, for instance). 


Theorem 4. There exists a constant B, > 0 such that, for 


every q © Y with [ q dQ =0, it holds 
Q 


— | gdivvdQ 


suj Li > fpolia (271) 
wan Mia ee 


Remark 14. We remark that estimate (271) is nothing but 
the infinite-dimensional version of the inf—sup condition 
written in its equivalent form (119). 


5.1.1 Fortin’s trick 


The next result provides a criterion for proving the inf—sup 
condition, called Fortin’s trick (see Fortin, 1977) or, more 
precisely, Fortin’s trick applied to the Stokes problem. 


Proposition 4. Suppose there exists a linear operator 
Mp: X —> X = R” such that 


Ti,vily < Calvle Vver (272) 
and 
f, vB =- f divv(NP4,)d2 YåeY=R" 
Q 


(273) 
with Cy independent of h. Then it holds 


2 BTg B 
vqeY su —~— z z llâl (274) 
sexo) Wl Ca 7 
that is, the inf-sup condition (266) is fulfilled with B = 
B./ CR. 


Proof. Take any @ € Y. We notice that from Theorem 4 
and Remark 13, we get 


— | divv(N?q,) dQ 
Q a A 
sup — + > BIN, ly = Bell 
vex\(0} IIvilx F eg 5i 
(275) 
Therefore, from (273), we have 
fi, v)™BTg x 
Cw BS g ally (276) 
very livlig 
Using (272), from (276) it follows 
(fi,v)™BTG 1 (fi,¥)™B"q 
== sup 
ved Ty 60 ITE, vy Ca veX\(0} livlig 
> Peal, eT 
H 


Since, obviously, {fi,v; veA} CX, from (277) we 
obtain 
aTBTG (fi,v) "B74 A 
sup “4 > ee Be vail, o 
zex\o) lle ~ veativeo WHaviix ~ CR 


(278) 

We now apply Proposition 4 to the PNC — P) element 

and the Crouzeix—Raviart element, skipping, however, the 

proof of (272). In both cases, the strategy for building the 
operator fi , is the following: 


1. Oneach triangle T € T,, we first define a suitable linear 
operator I, r: V> I, rV, valued in the space of 
velocity approximating functions on T, and satisfying 


f qh div (I, pv) d2 = f g'divvd2 (279) 
T T 


for every v € ¥ and every discrete pressure q*. This 
will be done by using, in particular, the element degrees 
of freedom for the velocity approximation. 

2. By assembling all the element contributions, we obtain 
a global linear operator 


æ —> Span{N?;i = 1,...,"} 


My: vi My = D M, ry =N ô; (280) 
TET 
3. We finally define fi,: ¥ —>» R” by setting 
fi,v=¢ if T,v=Nts, (281) 


that is, fi,v returns the components of the func- 
tion I,v with respect to the global velocity basis 
{N}; i=1,...,”}. From the definition of the matrix 
B, property (279), (280), and (281), it follows that con- 
dition (273) is satisfied. 


o The PN© — Py element. Fix T € T,, and recall that any 


approximated pressure g’ is a constant function on T. We 
wish to build TI, y in such a way that 


f q div (TI, rY) d2 = | gidiv vd (282) 
i p 


From the divergence theorem, (282) can be alternatively 
written as 


f q” Ml, rY): nds = f qv- nds (283) 
ar ar 

Denoting with M, (k = 1, 2, 3) the midpoint of the edge e,, 
we define Il, rv as the unique (vectorial) linear function 


such that 


Th, 7v(M,) = d vds 


k=1,2,3 (284) 
legl Jex 


From the divergence theorem and the Midpoint rule, it 
follows that 


[daw nae = f a" T, rY) nds 
T ar 


=] g'v-nds = f g'divvag (285) 
oT T 
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for every constant function g”. It is now sufficient to define 
the global linear operator TI, as 


Ty = > Mary = Nd, 
Tet, 


and the corresponding operator fi p satisfies condition (273) 


(cf. also (253)). 


o The Crouzeix—Raviart element. Fix T € T,, and recall 
that any approximated pressure q" is now a linear function 
on T. Hence, q can be uniquely decomposed as q” = 
o + 4, Where qg is a constant (the mean value of q” on 
T), and q; is a linear function having zero mean value, We 
now construct a linear operator Il, p: v > I] rY, where 
TI; rv is a quadratic vectorial polynomial such that 


| div (T1; rY) dQ = J divy dQ (286) 
T T 


or, alternatively, 


| (Tl, rY) -nds =Í v- nds (287) 
ar ar 


Denoting with V, (resp., M,) the vertexes of T (resp., the 
midpoint of the edge e,), the Cavalieri-Simpson rule shows 
that condition (287) holds if we set 


M rV) = WV) k= 1,2,3 


ie vV) + v(V,,) 


5 
I] | Ie 
mR Tee I, es 4 ea 


k=1,2,3 
Above, we have denoted with V,, and V,, the endpoints 
of side e,. So far, we have not used the bubble functions 
available for the approximated velocity. We now use these 


two additional degrees of freedom by defining v, r(Y) as 
the unique vectorial bubble function such that 


f avva = f adiv (v-M r d2 (289) 
T T 


for every linear function q, having zero mean value on T. 
We claim that if I1, pY =v, r(Y) + I, rY, then 


1 q' div Tlp rY) d2 = f q'divv da (290) 
T T 


for every linear polynomial q” = qo +4,. In fact, using 
(286), (289), and the obvious fact that fp divv, (V) 
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dQ = 0, we have 
Í q'div (Tl, rY) dQ = [ (ao + ay)div (vy, r(Y) 
T 
+ TI pv) dQ = f Qodiv (Vz r(Y) + I; ry) d2 
T 
+f q,div (Var) + M ry) dQ 
T 
= f godiv (TI; ry) dQ +f qi div vdQ 
T T 
-f Qodiv v dR + f gy div v d2 
£ T 
a { ghdiv vd (291) 
T 


Hence, the operator fi, arising from the global linear 
operator 


may = J (Yor) + Dry) = va) + Mv = Nô; 
Tet, 


fulfills condition (273). 


Remark 15. Conditions (288) reveal that the operator IT, 
(built by means of the local contributions II, y) exploits, 
in particular, the point values of v at all the vertexes of 
the triangles in 7,. However, this definition makes no 
sense for an arbitrary v € ¥, since functions in ¥ are 
not necessarily continuous. To overcome this problem, 
one should define a more sophisticated operator IT} for 
instance, taking advantage of an averaging procedure, More 
precisely, one could define the function Iv as the unique 
piecewise quadratic polynomial such that IT, viņ = 0 and 


1 
——— O r vdQ 
Area(D(V)) JD) 


3 (V, 
ma = z> f Jia 1V(Vy,,) 
em 


Il,v(V) = 


+ Tv) 
4 

(292) 
Above, V is any internal vertex of triangles in 7, and D(V) 
is the union of the triangles having V as a vertex. Moreover, 
€y is any internal edge having M as midpoint and Vy, Vy, 
as endpoints. In this case, it is possible to prove that for the 
resulting I1,, the very important property (272) holds with 
Cy independent of h. 


Remark 16. It is interesting to observe that condi- 
tion (283) and (287) suggest the following important fact 
about the discretization of the Stokes problem. Any rea- 
sonable discontinuous pressure approximation contains at 
least all the piecewise constant functions: relations (283) 


and (287) show that having some velocity degrees of free- 
dom associated with the triangle edges greatly helps in 
proving the inf—sup condition. 


5.1.2 Verfiirth’s trick 


‘We now describe another technique for proving the inf—sup 
condition, which can be profitably used when elements with 
continuous pressure interpolation are considered: the so- 
called Verfiirth’s trick (see Verfiirth, 1984). We begin by 
noting that, because of to the pressure continuity, it holds 


TRTA __ a : 4 
Bg = -f (NPĝ,) div (Nez) dQ 
= Í (VNPG,) - N?2; a9 (293) 
Q 


for every Z € X and ĝ € Y. In some cases, it is much easier 
to use the form (293) and prove a modified version of the 
inf-sup condition (266) with a norm for Y different from 
the one defined in (270) and involving the pressure gra- 
dients; see Bercovier and Pironneau (1977) and Glowinski 
and Pironneau (1979). More precisely, given a mesh 7;, we 
introduce in Y the norm 


1/2 
Willy, = | Y ak f IVN, Pag (294) 
Ken, K 


where hy denotes the diameter of the generic element K. 

The key point of Verfiirth’s trick is a smart use of the 
Properties of interpolation operator in order to prove that 
the inf-sup condition with the norm Y, implies the usual 
one. Indeed, we have the following result. 


Proposition 5. Suppose that 


(H1) for every velocity v € X there exists a discrete veloc- 
ity Yy = NYO} such that 


Ket, 


1/2 x 
(= Ay if Nto; — vi? a) < collvll x (295) 


Irl < clive (296) 


with Co, ¢, independent of h and v; 
(H2) there exists a constant B, > 0 independent of h such 
that 
A 2™BG z 
våeY sup = > pid, 9 
żexo UlZlly 


(ie. the inf-sup condition holds with the modified 
Y-norm (294)). 


“Then the inf—sup condition (with respect to the original 
norm (270)) 


AT PTA 
zB a, 


VqeY sup = Bllally (298) 


aex\(o) Êl 


is satisfied with B independent of h. 


Proof, Given q € Y, we observe that using (296) and (293) 
it holds 
oTpTg sTBTg oTBTg 
sup ae sup re 8S sup vea 
sexto; Êl ~ veno lrig ~ veario Call vil x 
(VNPG,) - Nid] dQ 


sup + (299) 
tei glvl 


Furthermore, from Theorem 4, there exists w € V such 
that 


-f div w(NPå,) dQ [copa wag f 

mi e 
2 ‘5 (300) 

For such a velocity w and the corresponding discrete 


velocity w; = Noi! , we obviously have 


(VNPG,) N'O! de L (VNPG,) -NBib! aQ 


sup 
ve X\{0} efivile 


cy wile 
01) 
Subtracting and adding w, we obtain 


Í (VNPG,) -N Ôj dQ [ (VNPG,) Nû — w) da 
Q = 22 


cliwl cy llwiy 


Í (VN?G,) > wdQ 
Q 


g (302) 


clwy 
To treat the first term in the right-hand side of (302), we 
observe that using (295) and recalling (294), we have 


z f (VNPG,) - (No! — w) aQ 
Q 


eee f (VNPG,)  NrÒl — w) dQ 
Ket, K 


=y f hy (WNPG,) “hz! (NBO! — w) dQ 
Keh, K 
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1/2 
<|} hk f varaan) 
Ket, x. 


< coldly, Iwll% (303) 
which gives 
| (VNPG,) + (NSio! — w) d2 = —collÂlly, IWle (304) 
Therefore, we get 


[ (VNPG,) + (NP; — w) da 
Q 


Tiia 
> —Żllâlly, 805) 
cilwg cy 


For the second term in the right-hand side of (302), we 
notice that (cf. (300)) 
Í (VNPG,)- wdQ 
Q 
cliwl 


Therefore, from (299), (301), (302), (305), and (306), we 
obtain 


> Pevayy (306) 
Cy 


pm 
2 BÂ Sevan, —2ially, G07) 
1 


e 
sup -ym = 
zex) Hlg ~ e c 
We now multiply the modified inf—sup condition (297) 
by co/ (Bc) to get 


BG 2: 
S up Be filly, (308) 
ByC) zex\(o) IÊ ~ % 


By adding (307) and (308), we finally have 


c 2TBTÂ _ B.,. 
14) sip oS Beigi (309) 
( B,C, exe Wale = ey. 


that is, the inf-sup condition (298) holds with $ = (B,/¢;) 
(1 + (o/B,¢)))*. 0 


Remark 17. We notice that hypothesis (H1) of Proposi- 
tion 5 is not very restrictive. Indeed, given a velocity v € ¥, 
the corresponding v; can be chosen as a suitable discrete 
velocity interpolating v, and (295) and (296) are both sat- 
isfied basically for every element of practical interest (see 
e.g. Brezzi and Fortin, 1991 and Ciarlet, 1978 for more 
details). 
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The Verfiirth trick was originally applied to the Hood— 
Taylor element depicted in Figure 9 (see Verfiirth, 1984), 
but it was soon recognized as a valuable instrument for 
analyzing all continuous pressure elements. Here we show 
how to use it for the analysis of the MINI element (whose 
original proof was given using Fortin’s trick in Arnold, 
Brezzi and Fortin, 1984). 

e The MINI element. We now give a hint on how to verify 
hypothesis (H2) of Proposition 5 for the MINI element (cf. 
Figure 8). 

For a generic ĝ € Y, we take its reconstructed discrete 
pressure q = N#ĝ,. Since g’ is a piecewise linear and 
continuous function, it follows that Vg? = VNFĝ, is a 
well-defined piecewise constant vector field. We now con- 
struct a discrete (bubble-type) velocity v* = N¥d,, defined 
on each triangle T € 7, as 


vi = h2.b,Vq" (310) 


where br is the usual cubic bubble (i.e. in area coordinates, 
by = 27M). Recalling (293) and using (310), we then 
obtain 


IBTA = f (VNPG,)- NBO, d2 = Í yq” -vt dQ 
Q Q 


=F} f IVN?PG,(2b, d2 (311) 
Tet, 


It is easy to show that for regular meshes (roughly: for 
meshes that do not contain ‘too thin’ elements, see e.g. 
Ciarlet, 1978 for a precise definition), there exists a constant 
C, > 0, independent of h, such that 


VT €4, f [VN?Ĝ, lbr dQ > C; Í |VN?PG, |? da 
$ .. (312) 
Therefore, from (311), (312), and (294) we get 


B> C, Yn Í. IVN? P d2 = C lâl, G13) 
Tet, 


Furthermore, using standard scaling arguments (cf. Brezzi 
and Fortin, 1991), it is possible to prove that there exists 
C, > 0 independent of h such that 


1/2 
Illy = ivl <C (= hb { IVN?G,? | 


Teh, 
= Czlâliy, (314) 
Hence, estimates (313) and (314) imply 
sB 


Gin 
— > <i 315 
ily = ally, (315) 


and condition (297) then follows with B, = C,/C, since 
sTpTg st ptg 
2 BG aF Bg 


sip —— >= = (316) 
zex\to) žil {Vly 


5.2 Appendix — error estimates 


In this brief Appendix, we present the guidelines to obtain 
error estimates, once the stability conditions have been 
established. We only consider the easiest case of conform- 
ing schemes (i.e. when the velocity is approximated by 
means of continuous functions). We refer to Brezzi (1974), 
Brezzi and Fortin (1991), and Braess (1997) for more 
details, as well as for the analysis of more complicated 
situations involving non-conforming approximations (such 
as the PNC — P, element (cf. Figure 15)). 

Before proceeding, we recall that for the Stokes prob- 
lem with our choices of norms, we have M, = 1, M, = 
J(d/u), and a = 1, no matter what the approximations 
of velocity and pressure are. However, in the subsequent 
discussion, we will not substitute these values into the esti- 
mates, in order to facilitate the extension of the analysis 
to other problems. We also notice that, on the contrary, 
the relevant constant B does depend on the choice of the 
interpolating functions. We have the following result. 


Theorem 5. Let (u, p) be the solution of problem (263) 
and suppose there exist discrete velocity and pressure 


u =N â}, p, = NPP G17) 

such that 
Ju- uly < ch, k,>0 (318) 
Ip- pily < Ch’, k, >0 619) 


If (û, B) is the solution of the discrete problem (264), then, 
setting u* = N"ii, and p* = N? p,, it holds 


lu- ulg + Ip — Pilly < cnt (320) 


with k = min{k,, k,)- 


Proof, Foru, and p; as in (317), we set i, = (@/)/_, EX 
and p, = (p/)”_, € Y. Taking into account that 


is the solution of the discretized problem (264), we obtain 


that 
{5 a 
a a 
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solves 
A B']{ai—i,) _ {f-Ad, —Bp, 
Fe ed ea 


Choosing as (admissible) velocity and pressure variations 
the interpolating shape functions, from (263), we have 


=u f vNt: Vaag- f div expan 
Q Q 

TSA, A (322) 
and 


f npaivuae =o s=1,...,m (323) 
Q 


Hence, system (321) may be written as 
A A E fa 
a at hw 324 
E o ||b-B, g ea 


fi, =p | VN: V (uu) da 
QR 


where 


- f avon (p-p) dQ i=1,...,n 
QR 


(325) 


and 


ai, =- | NPdiv(u—upan r=1,....m (326) 
Q 


Applying Theorem 1, we thus obtain 


1/2 


one Le Mi? 

Wi Gylly < Sle + pag llo (327) 
Wace i MP\ = Mn 

IB — Êz ily < G + oa) llfllp + p Ble (328) 


We proceed by estimating the dual norms ||] p and [[llg. 
Since for every ®= (0;)7_,, 


“i= uf V (Nta,) : V (u — u;) d2 
Q 


= Í div (NPS) (p — pr) a2 < MalÑlixla — uylly 


+ Mlz P — Pily (329) 
we obtain 
oF 
i < M,|lu—uylly+ MiP — Pilly (330) 
x 


which gives (cf. the dual norm definition (104)) 
fle < Malu — uzle + Mlp — Pilly (331) 


Analogously, for every q = (9,)",, we get 


ae =- | (wPd,) div (u - u) aQ < My hlly a ~ Ul 


(332) 
and therefore we have 
Ile < Myllu—uylly (333) 
From (327), (328), (331), and (333) we have 
ages M, M2”, 
lû — û;llz < ( a + ag lu- uzil y 
M, 
+7 llp — Pr ly (334) 
NEA M, Ma? | MM, 
IP — Brlly S p tae p2 lu — ulw 


1, My" 


+M, (; + | lp — Pilly (335) 


Observing that by triangle inequality and Remark 13, it 
holds 


lu — uly < lu — ulle + lu, — NPA; lly 
= ju ~ uzl + lû- û;llx (336) 
and 
lp — Ply < IP — Pilly + lpr- NP By lly 
= [lp — pylly + IIB — Bylly (337) 


from (334) and (335), we get the error estimates 


M?” M, 
lu- uig < (: ++ img) aul 
M, 
For lip — pily (338) 


M, M? MM, 
h a a b 
lp- P'llys ( p + ag + pi ju- uly 


M, _ M.°M, 
(ie Ses ee ip- Pilly 639 


We notice that the constant M,, which did not appear 
in the stability estimates, has now come into play. Fur- 
thermore, using (318) and (319), from (338) and (339), 
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we infer 


ju- uly lp — ply < cat (340) 


with k = min{k,, kp} and C = C(a, B, M,, M,) indepen- 
dent of h. O 


Remark 18. A crucial step in obtaining error estimate 


(340) is to prove the bounds (cf. (331) and (333)) 


IŠ < M,lu — ule + Mlp- Pr lly 64) 
(Bllg < Mlu — ull% (342) 


where f and € are defined by (325) and (326) respectively. 
The estimates above result from a suitable choice of the 
norms for X =R", Y =R", F =R”, and G =R”. In 
fact, by choosing for X the norm (269) and for F the 
corresponding dual norm, we can get (341), as highlighted 
by (329). Similarly, by choosing for Y the norm (270) and 
for G the corresponding dual norm, we can obtain (341) 


(cf. (332). 


Remark 19. The discrete functions u, and p; in (317) 
are typically chosen as follows: 

e u; is the nodal interpolated of u. 

Therefore, 


u, = Nû; (343) 


where fi, = (fi/)'_, is the vector containing the nodal 
values of u. 

e p; is the projection of p over the pressure approximation 
space, Therefore, 


Pr = NP (344) 


where the vector P; = (p/)"_, is uniquely determined by 


the following set of m equations 
[nenea = f N?pdQ s=1,...,m (345) 
Q 


For regular solution (u, p), standard approximation re- 
sults (see e.g. Ciarlet, 1978) allow to determine the expo- 
nents k, and k, entering in estimates (318) and (319) in 
terms of the selected approximation spaces for the veloc- 
ity and the pressure fields. For instance, when considering 
the Crouzeix—Raviart element (cf. Figure 12), we have 
ku =k, = 2. Hence, Theorem 5 shows that the discretiza- 
tion error is O(h?) (see Chapter 4, this Volume). 


6 RELATED CHAPTERS 


(See also Chapter 4, Chapter 15 of this Volume; Chap- 
ter 2, Volume 3). 
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1 INTRODUCTION 


As the range of phenomena that need to be simulated in 
engineering practice broadens, the limitations of conven- 
tional computational methods, such as finite elements, finite 
volumes, or finite difference methods, have become appar- 
ent. There are many problems of industrial and academic 
interest that cannot be easily treated with these classical 
mesh-based methods: for example, the simulation of man- 
ufacturing processes such as extrusion and molding, where 
it is necessary to deal with extremely large deformations of 
the mesh, or simulations of failure, where the simulation of 
the propagation of cracks with arbitrary and complex paths 
is needed. 

The underlying structure of the classical mesh-based 
methods is not well suited to the treatment of discontinuitics 
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that do not coincide with the original mesh edges. With a 
mesh-based method, a common strategy for dealing with 
moving discontinuities is to remesh whenever it is nec- 
essary. The remeshing process is costly and not trivial 
in 3D (if reasonable meshes are desired), and projection 
of quantities of interest between successive meshes usu- 
ally leads to degradation of accuracy and often results 
in an excessive computational cost. Although some recent 
developments (Moes, Dolbow and Belytschko, 1999; Wells, 
Borst and Sluys, 2002) partially overcome these difficulties, 
the implementation of discontinuities is not as simple as in 
meshfree methods. 

The objective of meshfree methods is to eliminate at 
least part of this mesh dependence by constructing the 
approximation entirely in terms of nodes (usually called 
particles in the context of meshfree methods). Moving 
discontinuities or interfaces can usually be treated with- 
out remeshing with minor costs and accuracy degradation 
(see, for instance, Belytschko and Organ, 1997). Thus the 
range of problems that can be addressed by meshfree 
methods is much wider than mesh-based methods. More- 
over, large deformations can be handled more robustly 
with meshfree methods because the approximation is not 
based on elements whose distortion may degrade the accu- 
racy. This is useful in both fluid and solid computa- 
tions. 

Another major drawback of mesh-based methods is the 
difficulty in ensuring for any real geometry, a smooth, pain- 
less, and seamless integration with computer aided engi- 
neering (CAE), industrial computer aided design (CAD), 
and computer aided manufacturing (CAM) tools. Meshfree 
methods have the potential to circumvent these difficulties. 
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The elimination of mesh generation is the key issue. 
The advantages of meshfree methods for 3D computations 
become particularly apparent, 

Meshfree methods also present obvious advantages in 
adaptive processes. There are a priori error estimates for 
most of the meshfree methods. This allows the defini- 
tion of adaptive refinement processes as in finite element 
computations: an a posteriori error estimate is computed 
and the solution is improved by adding nodes/particles 
where needed or increasing the order of the approxima- 
tion until the error becomes acceptable (see e.g. Babuska 
and Melenk, 1995; Melenk and Babuska, 1996; Duarte and 
Oden, 1996a; Babuska, Banerjee and Osborn, 2002a). 

Meshfree methods were originated over 25 years ago 
but it is in recent years that they have received substan- 
tial attention. The approach that seems to have the longest 
continuous history is the SPH method by Lucy (1977) and 
Gingold and Monaghan (1977) (see Section 2.1). It was first 
developed for modelling astrophysical phenomena without 
boundaries, such as exploding stars and dust clouds. Com- 
pared to other numerical methods, the rate of publications 
in this field was very modest for many years; progress is 
reflected in the review papers by Monaghan (1982, 1988). 

Recently, there has been substantial improvement in 
these methods. For instance, Dyka (1994) and Sweegle, 
Hicks and Attaway (1995) study its instabilities, Johnson 
and Beissel (1996) propose a method for improving strain 
calculations, and Liu, Jun and Zhang (1995b) present a 
correction function for kernels in both the discrete and 
continuous case. 

In fact, this approach can be seen as a variant of MLS 
approximations (see Section 2.2). A detailed description 
of MLS approximants can be found in Lancaster (1981). 
Nayroles, Touzot and Villon (1992) were evidently the first 
to use moving least square approximations in a Galerkin 
weak form and called it the diffuse element method (DEM). 
Belytschko, Lu and Gu (1994) refined the method and 
extended it to discontinuous approximations and called it 
element-free Galerkin (EFG), Duarte and Oden (1996b) and 
Babuska and Meienk (1995) recognize that the methods are 
specific instances of partitions of unity (PU), a method first 
proposed in Babuska, Caloz and Osborn (1994). Duarte and 
Oden (1996a) and Liu, Li and Belytschko (1997a) were 
also among the first to prove convergence of meshfree 
methods. 

This class of methods (EFG, DEM, PU, among others) is 
consistent and in the forms proposed stable, although sub- 
stantially more expensive than SPH because of the need 
of a very accurate integration. Zhu and Atluri (1998) pro- 
pose a Petrov—Galerkin weak form in order to facilitate the 
computation of the integrals, but usually leading to nonsym- 
metric systems of equations. De and Bathe (2000) use this 


approach for a particular choice of the approximation space 
and the Petrov—Galerkin weak form and call it the method 
of finite spheres. 

On a parallel path, Vila (1999) has introduced a different 
meshfree approximation specially suited for conservation 
laws: the renormalized meshless derivative (RMD) which 
tums out to give accurate approximation of derivatives 
in the framework of collocation approaches. Two other 
paths in the evolution of meshfree methods have been 
the development of generalized finite difference methods, 
which can deal with arbitrary arrangements of nodes, and 
particle-in-cell methods. One of the early contributors to 
the former was Perrone and Kao (1975), but Liszka and 
Orkisz (1980) proposed a more robust method. Recently, 
these methods have taken a character that closely resembles 
the moving least squares methods. 

In recent papers, the possibilities of meshfree methods 
have become apparent. The special issue Liu, Belytschko, 
and Oden (1996a) shows the ability of meshfree methods 
to handle complex simulations, such as impact, cracking, or 
fluid dynamics. Bouillard and Suleau (1998) apply a mesh- 
free formulation to acoustic problems with good results. 
Bonet and Lok (1999) introduce a gradient correction in 
order to preserve the linear and angular momentum with 
applications to fluid dynamics. Bonet and Kulasegaram 
(2000) proposes the introduction of integration correc- 
tion that improves accuracy with applications to metal 
forming simulation. Ofiate and Idelsohn (1998) propose 
a meshfree method, the finite point method, based on a 
weighted least-squares approximation with point colloca- 
tion with applications to convective transport and fluid flow. 
Recently several authors have proposed mixed approxi- 
mations combining finite elements and meshfree methods, 
in order to exploit the advantages of each method (see 
Belytschko, Organ and Krongauz, 1995; Hegen, 1996; Liu, 
Uras and Chen, 1997b; Huerta and Fernaéndez-Méndez, 
2000a; Hao, Liu and Belytschko, 2004). Several review 
papers and books have been published on meshfree methods 
(see Belytschko et al., 1996b; Li and Liu, 2002; Babuska, 
Banerjee and Osborn, 2003; Liu et al., 1995a). Two recent 
books are Atluri and Shen (2002) and Liu (2002). 


2 APPROXIMATION IN MESHFREE 
METHODS 


This section describes the most common approximants in 
meshfree methods. We will employ the name ‘approxi- 
mants’ rather than interpolants that is often mistakenly 
used in the meshfree literature because, as shown later, 
these approximants usually do not pass through the data, 
so they are not interpolants. Meshfree approximants can 
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be-classified into two families: those based on SPH and 
those based on MLS. As will be noted in Section 3, the 
SPH approximants are usually combined with collocation 
or point integration techniques, while the MLS approxi- 
mants are customarily applied with Galerkin formulations, 
though collocation techniques are growing in popularity. 


2.1 Smooth particle hydrodynamic 


2.1.1 The early SPH 


The earliest meshfree method is the SPH method (see Lucy, 

1977). The basic idea is to approximate a function u(x) by 

a convolution 

yur 
p 


u(x) = #(x) = f c,0( Jug) a (1) 
where ọ is a compactly supported function, usually called 
a window function or weight function, and p is the so- 
called dilation parameter. The support of the function is 
sometimes called the domain of influence. The dilation 
parameter characterizes the size of the support of $(x/p), 
usually by its radius, C, is a normalization constant such 


that 
fee) a= Q 


One way to develop a discrete approximation from (1) is 
to use numerical quadrature 


x; =x 


u(x) = æ) = w(x) == YC, o( ) ue) or 
¥ 


where x, and œ; are the points and weights of the numer- 
ical quadrature. The quadrature points are usually called 
particles. The previous equation can also be written as 


u(x) x üa) = we) =f oen) 8) 
I 


where the discrete window function is defined as 


xır 
ox, x)= c,o( 7 Jor 
Thus, the SPH meshfree approximation can be defined as 


u) ~ we) = JN, œ) ur) 
if 


with the approximation basis N; (x) = o(x,, x). 


Remark 1. Note that, in general, u° (x;) 4 u(x,). That is, 
the shape functions are not interpolants, that is, they do not 
verify the Kronecker delta property: 


Ny @) Æ bi 


This is common for all particle methods (see Figure 5 
for the MLS approximant) and thus special techniques 
are needed to impose essential boundary conditions (see 
Section 3). 


Remark 2. The dilation parameter p characterizes the 
support of the approximants N(x). 


Remark 3. In contrast to finite elements, the neighbor 
particles (particles belonging to a given support) have to 
be identified during the course of the computation. This is 
of special importance if the domain of support changes in 
time and requires fast neighbor search algorithms, a crucial 
feature for the effectiveness of a meshfree method (see e.g. 
Schweitzer, 2003). 


Remark 4. There is an optimal value for the ratio 
between the dilation parameter p and the distance between 
particles h. Figure 1 shows that for a fixed distribution 
of particles, k constant, the dilation parameter must be 
large enough to avoid aliasing (spurious short waves in 
the approximated solution). It also shows that an exces- 
sively large value for p will lead to excessive smoothing. 
For this reason, it is usual to maintain a constant ratio 
between the dilation parameter p and the distance between 
particles k. 


2.1.2 Window functions 


The window function plays an important role in meshfree 
methods. Other names for the window function are kernel 
and weight function. The window function may be defined 
in various manners. For 1D the most common choices are 
Cubic spline: 


A(x] — 1)x? + (2/3) [x] < 0.5 
yp (x) =2 4 40 — [x1)?/3 05<|x/<1 @ 
0 


1s|e| 
Gaussian: 
exp(—9x?) — exp(—9) kei : 
dip) = 1 — exp(—9) (5) 
1< lx] 


The above window function can easily be extended to 
higher dimensions. For example, in 2D the most common 
extensions are 
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Figure 1. SPH approximation functions and approximation of u(x) 


particles h = 0.5, and quadrature weights œ; = h, for p/h = 1, 2,4. 
Window function with spherical (cylindrical) support: 
$x) = op (ill) 
Window function with rectangular support (tensor product): 
pæ) = dp (1X) dip (2) 


where, as usual, x = (x1, x) and |\xl| = /(@j +23). 


2.1.3 Design of window functions 


In the continuous SPH approximation (1), a window 
function @ can easily be modified to exactly reproduce a 


= l — x? with cubic spline window function, distance between 


polynomial space P,, in R of degree < m, that is, 
pea”, 
rte) =f C,6(2—*) 20) vem © 


If the following conditions are satisfied 


IA) dy=1, 
fJ)” dy=0 for0<jsm 


the window function is able to reproduce the polynomial 
space P„. Note that the first condition coincides with (2) 
and defines the normalization constant, that is, it imposes 
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Cubic spline 


Corrected cubic spline 
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Figure 2. Cubic spline and corrected window function for polynomials of degree 2. 


the reproducibility of constant functions. The ability to 
reproduce functions of order m is often referred as mth 
order consistency. 

For example, the window function 


a 27 «120 , 
bo = (F- F400 a 
where (x) is the cubic spline, reproduces the second 
degree polynomial basis {1,x,x?}. Figure 2 shows the 
cubic spline defined in (4) and the corrected window 
function (7) (see Liu et al., 1996b for details). 

However, the design of the window function is not 
trivial in the presence of boundaries or with nonuniform 
distributions of particles (see Section 2.2). Figure 3 shows 
the corrected cubic spline window functions associated 
with a uniform distribution of particles, with distance h = 
0.5 between particles and p = 2h, and the discrete SPH 
approximation described by (3) for u(x) = x with uniform 
weights œ = h. The particles outside the interval {—1, 1] 
are considered in the approximation, as in an unbounded 


2.5 


domain (the corresponding translated window functions are 
depicted with a dashed line). The linear monomial u(x) = x 
is exactly reproduced. However, in a bounded domain, the 
approximation is not exact near the boundaries when only 
the particles in the domain [—1, 1] are considered in this 
example (sce Figure 4). 


Remark 5 (Consistency) Ifthe approximation reproduces 
exactly a basis of the polynomials of degree less or equal 
to m then the approximation is said to have m-order 
consistency. 


2.1.4 Correcting the SPH method 


The SPH approximation is used in the solution of PDEs, 
usually through a collocation technique or point integra- 
tion approaches (see Monaghan, 1982; Vila, 1999; Bonet 
and Lok, 1999; and Section 3.1). Thus, it is necessary to 
compute accurate approximations of the derivatives of the 
dependent variables. The derivatives provided by original 


E; 
-1 -0.5 0 0.5 1 


Figure 3. Modified cubic splines and particles, h = 0.5, and SPH discrete approximation for u(x) = x with p/h = 2 in an ‘unbounded 


domain’. 
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Figure 4. Modified cubic splines and particles, h = 0.5, and SPH 


domain. 


SPH method can be quite inaccurate, and thus, it is nec- 
essary to improve the approximation, or its derivatives, in 
some manner. 

Randles and Libersky (1996), Krongauz and Belytschko 
(1998a), and Vila (1999) proposed a correction of the gra- 
dient: the RMD. It is an extension of the partial correction 
proposed by Johnson and Beissel (1996). Let the deriva- 
tives of a function u be approximated as the derivatives of 
the SPH approximation defined in (3), 


Vu(x) = Vu’) = Veale), x) u) 
di 


The basic idea of the RMD approximation is to define a 
corrected derivative 


D'u) := $ Bee) Voces, x) u(y) (8) 
J 


where the correction matrix B(x) is chosen such that 
Vu(x) = D®u(x) for all linear polynomials: In Vila (1999), 
a symmetrized approximation for the derivatives is defined 

Dgu(x) := Duce) — D°1(x) u(x) (9) 
where, by definition (8), 


Dlx) = X B(x) Volw; x) 
J 


Note that (9) exactly interpolates the derivatives when u(x) 
is constant. The consistency condition Vu(x) = D$u(x) 
must be imposed only for linear monomials 


-1 
Bx) = [> Vol; x) Qty — =| 
T 


If the ratio between the dilation parameter p and the 
distance between particles remains constant, there are a 


discrete approximation for u(x) = x with p/h = 2 in a bounded 


priori error bounds for the RMD, Dgu, similar to the linear 
finite element ones, where p plays the role of the element 
size in finite elements (see Vila, 1999). 

In SPH, the interparticle forces coincide with the vector 
joining them, so conservation of linear and angular momen- 
tum are met for each point pair. In other words, linear and 
angular momentum are conserved locally (see Dilts, 1999). 

When the kernels are corrected to reproduce linear poly- 
nomials or the derivatives of linear polynomials, these local 
conservation properties are lost. However, global linear and 
translational momentum are conserved if the approximation 
reproduces linear functions (see Krongauz and Belytschko, 
1998a and Bonet and Lok, 1999), 

Although the capability to reproduce a polynomial of 
a certain order is an ingredient in many convergence 
proofs of solutions for PDEs, it does not always suffice 
to pass the patch test. Krongauz and Belytschko (1998b) 
found that corrected gradient methods do not satisfy the 
patch test and exhibit poor convergence when the cor- 
rected gradient is used for the test function. They showed 
that a Petrov-Galerkin method with Shepard test functions 
satisfies the patch test. 

There are other ways of correcting the SPH method. For 
example Bonet and Lok (1999) combine a correction of 
the window function, as in the reproducing kernel particle 
method (RKPM) method (see Section 2.2), and a correction 
of the gradient to preserve angular momentum. In fact, there 
are a lot of similarities between the corrected SPH and 
the renormalized meshless derivative. The most important 
difference between the RMD approach, where the 0-order 
consistency is obtained by the definition of the symmetrized 
gradient (8), and the corrected gradient Vu? is that in 
this case 0-order consistency is obtained with the Shepard 
function. 

With a similar rationale, Bonet and Kulasegaram (2000) 
present-a correction of the window function and an inte- 
gration corrected vector for the gradients (in the context of 
metal forming simulations). The corrected approximation 


A 
4 i 
A 


ig used in a weak form with numerical integration at the 
particles (see Section 3.2). Thus, the gradient must be eval- 
uated only at the particles. However, usually the particle 
integration is not accurate enough and the approximation 
fails to pass the patch test. In order to obtain a consistent 
approximation, a corrected gradient is defined. At every 
particle x,, the corrected gradient is computed as 


Yul (x7) = Vu? (x;) + y fel, 


where y, is the correction vector (one component for each 
spatial dimension) at particle x, and where the bracket 
[uv], is defined as [u]; = u@r,) — u’ @,). These extra 
parameters, yz, are determined requiring that the patch test 
be passed. A global linear system of equations must be 
solved to compute the correction vector and to define the 
derivatives of the approximation; then, the approximation 
of u and its derivatives are used to solve the boundary value 
problem. 


2.2 Moving least-squares approximants 


2.2.1 Continuous moving least-squares 


The objective of the MLS approach is to obtain an approx- 
imation similar to a SPH approximant (1), with high accu- 
racy even in a bounded domain. Let us consider a bounded 
or unbounded domain &. The basic idea of the MLS 
approach is to approximate u(x), at a given point x, through 
a polynomial least-squares fit of u in a neighborhood of x. 
That is, for fixed x € Q, andz near x, u(z) is approximated 
with a polynomial expression 


uz) = ° (z, x) = P'Q) ee) (10) 


where the coefficients eœ) = {c)(x), cŒ), c)(x)}T are 
not constant, they depend on point x, and P(z) = {po), 
Pi); ss pF includes a complete basis of the sub- 
space of polynomials of degree m. It can also include exact 
features of a solution, such as cracktip fields, as described 
in Fleming et al. (1997). The vector c(x) is obtained by a 
least-squares fit, with the scalar product 


ee [aC )roeay aD 


That is, the coefficients ¢ are obtained by minimization of 
the functional J, (c) centered in x and defined by 


5 ~x 
o= [ ¢2=*)uo)—Po ew ay aD 
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where oy —x)/ p) is the compact supported weighting 
function. The same weighting/window functions as for 
SPH, given in Section 2.1.2, are used. 


Remark 6. Thus, the scalar product is centered at the 
point x and scaled with the dilation parameter p. In fact, 
the integration is constructed in a neighborhood of radius 
p centered at x, that is, in the support of $((- — x)/p). 


Remark 7 (Polynomial space) In one dimension, we can 
let p;(x) be the monomials x! and, in this particular case, 
l =m. For larger spatial dimensions, two types of poly- 
nomial spaces are usually chosen: the set of polynomials 
P,, of total degree < m, and the set of polynomials Q,, of 
degree < m in each variable. Both include a complete basis 
of the subspace of polynomials of degree m. This, in fact, 
characterizes the a priori convergence rate (see Liu, Li and 
Belytschko, 1997a or Ferndndez-Méndez, Diez and Huerta, 
2003). 


The vector c(x) is the solution of the normal equations, 
that is, the linear system of equations 


M(x) cœ) = (P, 4), (13) 


where M(x) is the Gram matrix (sometimes called a 
moment matrix), 


me) = [oor a 


From (14) and (10), the least-squares approximation of u 
in a neighborhood of x is 


u(z) = üP, x) = P'e) M&P, u) (15) 


Since the weighting function ọ usually favors the central 
point x, it seems reasonable to assume that such an approx- 
imation is more accurate precisely at z =x and thus the 
approximation (15) is particularized at x, that is, 


u(x) = ü’ (x) = HP, x) 
with 
ae.) = f (HEO Me) PO) uo ay C6) 


where the definition of the scalar product, equation (11), 
has been explicitly used. Equation (16) can be rewritten as 


woe) ~ ree) = | c0.2) 67 )uon & 


which is similar to the SPH approximation (see equation 
(1)) and with the scalar correction term C,(y, x) defined as 


Cy, x) = P") M) PO) 
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The function defined by the product of the correction and 
the window function 9, 


Qy, x) = cone =>) 


is usually called kernel function. The new correction term 
depends on the point x and the integration variable y; it 
provides an accurate approximation even in the presence 
of boundaries (see Liu et al., 1996b for more details). In 
fact, the approximation verifies the following consistency 
property (see also Wendland, 2001). 


Proposition 1 (Consistency/reproducibility property) 
The MLS approximation exactly reproduces all the 
polynomials in P. 


Proof. The MLS approximation of the polynomials p; (x) 
is 


phe) = P'E) M) (P, pi)  fori=0,...,1 
or, equivalently, in vector form 
PeT — py -l EE PY 
Beye) =P) Me) [ o(2—)roron a 
M(x) 


Therefore, using the definition of M (see (14)), it is trivial 
to verify that P?(v) = P(x). 


2.2.2 Reproducing kernel particle method 
approximation 


Application of a numerical quadrature in (16) leads to the 
RKPM approximation 


ux) = ü (x) = uP) 


= So w(t, PEM x) Pæ) ur) 


les? 


where o(x,,x) = œr >((x; —x)/p) and x, and œ; are 
integration points (particles) and weights respectively. The 
particles cover the computational domain , 2 C R”. Let 
S£ be the index set of particles whose support include 
the point x (see Remark 9). This approximation can be 
written as 


u(x) ~ u(r) = X N œ) u) (17) 


lest 
where the approximation functions are defined by 


N) = w(xy,x)P%) M) P) (18) 


Remark 8. In order to preserve the consistency/reprodu- 
cibility property, the matrix M, defined in (14), must be 
evaluated with the same quadrature used in the discretiza- 
tion of (16) (see Chen et al., 1996 for details). That is, 
matrix M(x) must be computed as 


MG) = J w(x; x) P&) Pe) (19) 


Test 


Remark 9. The sums in (17) and (19) only involve the 
indices 7 such that o(x;,x) Æ 0, that is, particles x, in a 
neighborhood of x. Thus, the set of neighboring particles 
is defined by the set of indices 


SL := {J such that |x; —xl| < p} 


Remark 10 (Conditions on particle distribution) The 


matrix M(x) in (19) must be regular at every point x in 
the domain. Liu, Li and Belytschko (1997a) discuss the 
necessary conditions. In fact, this matrix can be viewed 
(see Huerta and Fernéndez-Méndez, 2000a or Fernández- 
Méndez and Huerta, 2002), as a Gram matrix defined with 
a discrete scalar product 


(fi ale = >, oana) FE) se) (20) 


Test 


If this scalar product is degenerated, M(x) is singular. 
Regularity of M(x) is ensured by having enough particles in 
the neighborhood of every point x and avoiding degenerate 
patterns, that is 


O card SP >i+1. 

Gi) AF e span{pp, Pis -> Pi} \ {0} such that F(@%,) = 
Ovi ¢ SP. . 
Condition (ii) is easily verified. For instance, for m = 1 
(linear interpolation), the particles cannot lie in the same 
straight line or plane for, respectively, 2D and 3D. In 1D, 
for any value of m, it suffices that different particles do not 
have the same position. Under these conditions, one can 
compute the vector P\%~)M~!(x) at each point and thus 
determine, from (18), the shape functions, N;@). 


2.2.3 Discrete MLS: element-free Galerkin 
approximation 


The MLS development, already presented in Section 2.2.1 
for the continuous case (i.e. using integrals), can be devel- 
oped directly from a discrete formulation. As in the contin- 
uous case, the idea is to approximate u(x), at a given point 
x, by a polynomial least-squares fit of 4 in a neighborhood 
of x. That is, the same expression presented in (10) can 
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be used, namely, for fixed x € 9, and z near x; u(Z) is 
approximated with the polynomial expression 


u(z) = uP, x) = P\z)e(x) (21) 


In the framework of the EFG method, the vector c(x) is 
also obtained by a least-squares fit, with the discrete scalar 
product defined in (20), where w(x,,x) is the discrete 
weighting function, which is equivalent to the window 
function 


w(t, x) = (7) (22) 


and S? is the set of indices of neighboring particles defined 
in Remark 9, That is, the coefficients ¢ are obtained by 
minimization of the discrete functional J, (c) centered in x 
and defined by 


J,(0) =F o&;,x)[uee)-P@pe@) e3 


Test 
The normal equations are defined in a similar manner, 
M(x) e(x) = (P, u), (24) 


and the Gram matrix is directly obtained from the discrete 
scalar product (see equation (19)). After substitution of the 
solution of (24) in (21), the least-squares approximation of 
u in a neighborhood of x is obtained 


u(z) = u’ (z, x) 


=P™)M E) $ owr) Pue) 25) 
fest 


Particularization of (25) at z = x leads to the discrete MLS 
approximation of u(x) 


u(x) = uP (x) := u’ (x,x) 
with 
u’ œ, x) = > ol, x) P(x) M(x) Pæ u) (26) 


les? 


This EFG approximation coincides with the RKPM approx- 
imation described in equations (18) and (19). 


Remark 11 (Convergence) Liu, Li and Belytschko 
(1997a) showed convergence of the RKPM and EFG. The 
a priori error bound is very similar to the bound in finite 
elements. The parameter p plays the role of h and m 
(the order of consistency) plays the role of the degree 
of the approximation polynomials in the finite element 
mesh. Convergence properties depend on m and p. They do 
depend on the distance between particles because usually 
this distance is proportional to p, that is, the ratio between 
the particle distance over the dilation parameter is of order 
one (see Liu, Li and Belytschko, 1997a). 


Remark 12. The approximation is characterized by the 
order of consistency required, that is, the complete basis of 
polynomials employed in P, and by the ratio between the 
dilation parameter and the particle distance, p/h. In fact, the 
bandwidth of the stiffness matrix increases with the ratio 
o/h (more particles lie inside the circle of radius p) (see for 
instance Figure 5). Note that, for linear consistency, when 
p/h goes to 1, the linear finite element shape functions are 
recovered, 


Remark 13 (Continuity) If the weight function ¢ is C*, 
then the EFG/MLS shape functions and the RKPM shape 
functions are (* (see Liu, Li and Belytschko, 1997a). Thus, 
if the window function is a cubic spline, as shown in 
Figures 6 and 7, the first and second derivatives of the 
shape functions are well defined throughout the domain, 
even with linear consistency. 


2.2.4 Reproducibility of the MLS approximation 


The MLS shape functions can be also obtained by imposing 
a priori the reproducibility properties of the approximation. 
Consider a set of particles x, and a complete polynomial 
base P(x). Let us assume an approximation of the form 


u(x) = $ Nr) ulr) (27) 


Ies? 


with approximation functions defined as 


N, (x) = ox, x) Pœ) a) (28) 


Figure 5. Interpolation functions with p/h ~ 1 (similar to finite elements) and p/h = 2.6, with cubic spline and linear consistency. 
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Figure 6. Shape function and derivatives for linear finite elements and the EFG approximation. A color version of this image is available 


at http://www.mrw.interscience.wiley.com/ecm 


The unknown vector a(x) in R'+! is determined by impos- 
ing the reproducibility condition, which imposes that the 
approximation proposed in (27) is exact for all the polyno- 
mials in P, namely, 


Pæ) = D> NE) Pæ) (29) 
Jes? 


After substitution of (28) in (29), the linear system of 
equations that determines a(x) is obtained 


M) a) = Pw) (30) 
That is, 
a(x) = M! @)P Gr) (31) 


where M(x) is the same matrix defined in (19). Finally, the 
approximation functions N;(x) are defined by substituting 
in (28) the vector œ (see (31)). Note that, with this sub- 
stitution, the expression (18) for the MLS approximation 


functions is recovered and consistency is ensured by 
construction. 

Section 2.2.7 is devoted to some implementation details 
of the EFG method. In particular, it is shown that the 
derivatives of N;(x) can be computed without excessive 
overhead. 


2.2.5 MLS centered and scaled approach 


For computational purposes, it is usual and preferable to 
center in x, and scale with p the polynomials involved in 
the definition of the meshfree approximation functions (see 
Liu, Li and Belytschko, 1997a or Huerta and Fernández- 
Méndez, 2000a). Thus, another expression for the EFG 
shape functions is employed 


xx 


N, œ) = ox; x) P*( ) ae) (32) 


which is similar to (28). Recall also that typical expressions 
for the window function are of the type o(y,x) = o(( — 


PO) = >> Nie) P(—) (33) 


Tes? 
which is equivalent to condition (29), when p is constant 
everywhere (see Remark 14 for nonconstant p). After sub- 


stitution of (32) in (33), the linear system of equations that 
determines a(x) is obtained as follows: 


M(x) a(x) = PO) (34) 
where 


Mix) = Y o(ry,x) RK) ne) (35) 


Jes? 


Remark 14. The consistency conditions (29) and (33) are 
equivalent if the dilation parameter p is constant. When 
the dilation parameter varies at each particle, the same 
expression for N, (x) is used, namely equation (28), but 
the varying dilation parameter, p; associated to particle xz, 
is embedded in the definition of the weighting function; that 
is, equation (22) is modified as follows: 


oxx) =(=) 
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Figure 7. Distribution of particles, EFG approximation function and derivatives, with p/h = 2.2 with circular supported cubic spline 
EFG, p/h=3.2 and linear consistency. A color version of this image is available at http://www.mrw.interscience.wiley.com/ecm 
P(x) ={1, x, x7}7 
x)/p). The consistency condition becomes in this case Note that a constant p is employed in the scaling of the 
| polynomials P. The constant value p is typically chosen as 


the mean value of all the p,. The consistency condition in 
this case is also (33). It also imposes the reproducibility of 
the polynomials in P. 


This centered expression for the EFG shape functions can 
also be obtained with a discrete MLS development with the 
discrete centered scalar product 


(hale = Dot.) fF) eR) G0 


Jest P 


The MLS development in this case is as follows: for fixed 
x, and for z near x, u is approximated as 


ug) =u) =P èE) 6D 


where c is obtained, as usual, through a least-squares fitting 
with the discrete centered scalar product (36). 


2.2.6 The diffuse derivative 


The centered MLS allows, with a proper definition of the 
polynomial basis P, the reinterpretation of the coefficients 
in c(x) as approximations of u and its derivatives at the 
fixed point x. 
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The approximation of the derivative of u in each spatial 
direction is the corresponding derivative of u°. This requires 
taking the derivative of (21), that is 

du _ au? apt dc 


~ = + pt 38 
ax ax ax c ax G8) 


On one hand, the second term on the r.h.s. is not trivial. 
Derivatives of the coefficients ¢ require the resolution of 
a linear system of equations with the same matrix M. 
As noted by Belytschko et al. (1996a), this is not an 
expensive task (see also Section 2.2.7). However, it requires 
the knowledge of the cloud of particles surrounding each 
point x and, thus, it depends on the point where derivatives 
are evaluated. 

On the other hand, the first term is easily evaluated. 
The derivative of the polynomials in P is trivial and can 
be evaluated a priori, without knowledge of the cloud of 
particles surrounding each point x. 

Villon (1991) and Nayroles, Touzot and Villon (1992) 
propose the concept of diffuse derivative, which consists in 
approximating the derivative only with the first term on the 
r.h.s. of (38), namely, 


T 


gu? ðu? eal = ap” aie 
ox 


bx az 


Ea 
~ az 


Z= Za 


From a computational cost point of view, this is an interest- 
ing alternative to (38). Moreover, the following proposition 
ensures convergence at the optimal rate of the diffuse 
derivative. 


Proposition 2. [fu? is an approximation to u with an order 
of consistency m (i.e. P includes a complete basis of the 
subspace of polynomials of degree m) and p/h is constant, 
then 


pm tl—tel 


< 
N (m+1)! 


kly aly? 
Vi[k| =0,...,m 
ax* ax* k 

(39) 
where k is a multiindex, k = (k; ky,...,k,,,) and |k{ = 


kth te +h. 


The proof can be found in Villon (1991) for LD or Huerta, 
Vidal and Villon (2004b) for higher spatial dimensions. To 
clearly identify the coefficients of c with the approximations 
of u and its derivatives, the basis in P should be written 
appropriately. That is, each component of P is P,(§) = 
€%/o! for |o| =0,...,m, where a standard multiindex 
notation is employed, 


he = ATHY. ah; 


a! = aylo!...c, 48 foe] =a, F HH Ona 


Finally, it is important to emphasize the requirements 
that M is regular and bounded (see Huerta, Fernández- 
Méndez and Díez, 2002; Fernández-Méndez and Huerta, 
2002; Fernández-Méndez, Díez and Huerta, 2003). 


Remark 15. For example, in 1D with consistency of order 
two, that is, Pœ) = {1, px, (ox)?/2}, the expression (37) 
can be written as 


(¢—x)? 


(40) 
Thus, taking the derivative with respect to z and imposing 
LZ=X, 


u(z) = uP(z, x) = co(x) + c E) — x) +E) 


u(x) = cel), ue) ex) and u(x) ~e,(x) 


In fact, this strategy is the basis of the diffuse element 
method: the diffuse derivative is used in the weak form of 
the problem (see Nayroles, Touzot and Villon, 1992; Bre- 
itkopf, Rassineux and Villon, 2001). Moreover, the gen- 
eralized finite difference interpolation or meshless finite 
difference method (see Orkisz, 1998) coincides also with 
this MLS development. The only difference between the 
generalized finite difference approximants and the EFG cen- 
tered approximants is the definition of the set of neighboring 
particles S?. 


2.2.7 Direct evaluation of the derivatives 


Another alternative to computing the derivatives is to fully 
derive the expression for the shape function (28) (see (38)), 
taking into account the dependencies given by equations 
(30) and (19). The details can be found in the references 
by Belytschko et al. (1996a,b). É 

For clarity, the 1D case is developed, x € R. For higher 
dimensions, the process is repeated for each component 
of x. The derivative of the shape function (28) can be 
written as 


Mio = @(x;, x) PY) Za) 


TELD phe) ae) UD 


The derivative of the weighting function is easily obtained 
because w(x,,x) has usually known analytical expressions 
(see (22) and Section 2.1.2). The a priori nontrivial part is 
the evaluation of dæ/dx, but an expression for this vector 
can be obtained by implicit derivation of (30), 


M, (x) aŒ) + M(x) Ze) = To 


where matrix M, is the derivative of matrix M, 


Mo = G2 Pa) Pia) 
J 


which is trivial to compute. Thus, de /dx is the solution of 
the linear system of equations 


dex dP 
M(x) =) = gO — My @) a(x) 


which represents another linear system of equations with 
the same matrix as in (30) (the factorization of M can be 
reused) and a new independent term. Moreover, the product 
M,,(x)a(x) can be computed in an efficient way as 


Mae) = D WEL pe [Pe ac] 
J 


involving only vector operations. 

In summary, the evaluation of the derivatives of the shape 
functions (see (41)) requires little extra computer cost and, 
moreover, higher-order derivatives can also be computed 
repeating the same process; it only depends on the regularity 
of w(x;,x). Obviously, the same development can be 
done for the centered and scaled approach defined in 
Section 2.2.5. 


2.2.8 Partition of the unity methods 


The set of MLS approximation functions can be viewed as 
a partition of unity: the approximation verifies, at least, 
the Q-order consistency condition (reproducibility of the 
constant polynomial p(x) = 1) 


Yon, 1=1 
$ 


This viewpoint leads to new approximations for meshfree 
methods. Based on the partition of the unity finite element 
method (PUFEM) proposed by Babuska, Banerjee and 
Osborn (2003), Babuska and Melenk (1995), and Duarte 
and Oden (1996b) use the concept of partition of unity to 
construct approximations with consistency of order k > 1. 
They call their method h-p clouds. The approximation is 


ux) = u’ (x) = SON, eu, 
I 


+O D bre) ar] 
i 


I i= 


where N,(x) are the MLS approximation functions, 4y; 
are n, polynomials of degree greater than k associated to 
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each particle x,, and u p by; are coefficients to determine. 
Note that the polynomials q;; (x) increase the order of the 
approximation space. These polynomials can be different 
from particle to particle, thus facilitating the hp-adaptivity. 


Remark 16. As commented in Belytschko et al. (1996b), 
the concept of an extrinsic basis, q,;(x), is essential for 
obtaining p-adaptivity. In MLS approximations, the intrin- 
sic basis P cannot vary form particle to particle without 
introducing a discontinuity. 


3 DISCRETIZATION OF PARTIAL 
DIFFERENTIAL EQUATIONS 


All of the approximation functions described in the previous 
section can be used in the solution of a partial differen- 
tial equation (PDE) boundary value problem. Usually SPH 
methods are combined with a collocation or point integra- 
tion technique, while the approximations based on MLS are 
usually combined with a Galerkin discretization. 

A large number of published methods with minor differ- 
ences exist. Probably, the best known, among those using 
MLS approximation, are the meshless (generalized) finite 
difference method (MFDM) developed by Liszka and Ork- 
isz (1980), the DEM by Nayroles, Touzot and Villon (1992), 
the element-free Galerkin (EFG) method by Belytschko, Lu 
and Gu (1994), the RKPM by Liu, Jun and Zhang (1995b), 
the meshless local Petrov-Galerkin (MLPG) by Zhu and 
Atluri (1998), the corrected smooth particle hydrodynam- 
ics (CSPH) by Bonet and Lok (1999), and the finite point 
method (FPM) by Oñate and Idelsohn (1998). Table 1 clas- 
sifies these methods depending on the evaluation of the 
derivatives (see Sections 2.2.6 and 2.2.7) and how the PDE 
is solved (Galerkin, Petrov-Galerkin, point collocation). 

Partition of unity methods can be implemented with 
Galerkin and Petrov—Galerkin methods. For instance, the 
h-p clouds by Duarte and Oden (1996b) uses a Galerkin 


Table 1. Classification of MLS based meshfree methods 
for PDE’s. 


Evaluation of derivatives 


Direct Diffuse 
Gauss EFG MFDM 
quadrature RKPM DEM 
Galerkin 
Particle CSPH?* 
quadrature 
Petrov—Galerkin MLPG 
Point collocation FPM 


*Direct and global evaluation of derivatives. 
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weak form with accurate integration, while the finite spheres 
by De and Bathe (2000) uses Shepard functions (with cir- 
cular/spherical support) enriched with polynomials and a 
Galerkin weak form with specific quadratures for spher- 
ical domains (and particular/adapted quadratures near the 
boundary); almost identical to h-p clouds (see Duarte and 
Oden, 1996b). 

Tn order to discuss some of these methods in more detail, 
the model boundary value problem 


Au-u=-f mQ (42a) 
u=up only (42b) 

ðu 
on Ry on Fy (42c) 


is considered, where A is the Laplace operator in 2D, 
A- = 3?-/3x? + 3?-/3y?, n is the unitary outward normal 
vector on ðQ, 0/8n is the normal derivative, 3-/ðn = 
n, ð-/ð3x +n, 8-/3y, Py UPy = 8Q, and f, up, and gy 
are known. 


Remark 17. A major issue in the implementation of 
meshfree methods is the identification (localization) of 
the neighboring particles. That is, given a point x in the 
domain, identify which particles have a nonzero shape 
function at this point, that is, determine the x, such that 
$(@, ~x)/p) 40 to define S?. 

There are several options. The usual ones consist in 
determining the closest particles to the point of interest (see 
Randles and Libersky, 2000) or to use a nonconforming 
regular mesh of squares or cubes (cells) parallel to the 
Cartesian axes. For every cell, the indices of the particles 
inside the cell are stored, The regular structure of the cell 
mesh allows one to find, given a point x, the cell where 
x is located and, thus, determine the neighboring particles 
just looking in the neighboring cells. Fast algorithms for 
finding neighboring particles can be found in the book by 
Schweitzer (2003). 


3.1 Collocation methods 


Consider an approximation, based on a set of particles {x,}, 
of the form 


ue) = uŒ) = Du, N Œ) 
I 


The shape functions N,(x) can be SPH shape functions 
(Section 2.1.1) or MLS shape functions (Section 2.2), and 
u, are coefficients to be determined. 

In collocation methods (see Oñate and Idelsohn, 1998 or 
Aluru, 2000), the PDE (42a) is imposed at each particle 


in the interior of the domain 2, the boundary conditions 
(42b) and (42c) are imposed at each particle of the cor- 
responding boundary. In the case of the model problem, 
this leads to the linear system of equations for the coeffi- 
cients u;: 


er [AN Œ) —N,@)]=—-f@)) ve, €2 
I 

D ur NiE) = ups) vx, elp 
T 


ON, 
ye ay oP = By @;) vx, ETN 
7 n 


Note that the shape functions must be C?, and thus, a C? 
window function must be used. In this case, the solution at 
particle x, is approximated by 


u(y) = ul) =Y uN (xy) 
I 


which in general differs from the coefficient u, (see 
Remark 1). 

In the context of the renormalized meshless derivative 
(see Vila, 1999), the coefficient u, is considered as the 
approximation at the particle x, and only the derivative 
of the solution is approximated through the RMD (see (9)). 
Thus, the linear system to be solved becomes 


You, AN; (ej) — uy =f) Vx, EQ 
I 


uy =Up(X,;) Yx; EFp 
aN, 

you, zie) = 8y (x7) Wx, ET y 

7 n 


Both possibilities are slightly different from the SPH 
method by Monaghan (1988) or from SPH methods 
based on particle integration techniques (see Bonet and 
Kulasegaram, 2000). 


3.2 Methods based on a Galerkin weak form 


The meshfree shape functions can also be used in the 
discretization of the weak integral form of the boundary 
value problem. For the model problem (42), the Bub- 
nov—Galerkin weak form (also used in the finite element 
method (FEM)) is 


[5s an+ f vu dQ 
Q Q 


= fof avs f vgy dF Vv 
Q Tw 
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where v vanishes at Fp and u = up at Fp. However, this 
weak form can not be directly discretized with a standard 
meshfree approximation. The shape functions do not verify 
the Kronecker delta property (see Remark 1) and, thus, it 
is difficult to select v such that, v = 0 at Fp and to impose 
that u’ = up at Ip. Specific techniques are needed in 
order to impose Dirichlet boundary conditions. Section 3.3 
describes the treatment of essential boundary conditions in 
meshfree methods. 

An important issue in the implementation of a meshfree 
method with a weak form is the evaluation of integrals. 

Several possibilities can be considered to evaluate inte- 
grals in the weak form: (1) the particles can be the quadra- 
ture points (particle integration), (2) a regular cell structure 
(for instance, the same used for the localization of parti- 
cles) can be used with numerical quadrature in each cell 
(cell integration), or (3) a, not necessary regular, back- 
ground mesh can be used to compute integrals. The first 
possibility (particle integration) is the fastest, but as in col- 
location methods, it can result in rank deficiency. Bonet and 
Kulasegaram (2000) propose a global correction to obtain 
accurate and stable results with particle integration. The 
other two possibilities present the disadvantage that, since 
shape functions and their derivatives are not polynomials, 
the number of integration points leads to high computa- 
tional costs. Nevertheless, the cell structure or the coarse 
background mesh, which do not need to be conforming, 
are easily generated and these techniques ensure an accu- 
rate approximation of the integrals (see Chapter 4, this 
Volume). 


3.3 Essential boundary conditions 


Many specific techniques have been developed in the recent 
years in order to impose essential boundary conditions in 
meshfree methods. Some possibilities are (1) Lagrange mul- 
tipliers (Belytschko, Lu and Gu, 1994), (2) modified varia- 
tional principles (Belytschko, Lu and Gu, 1994), (3) penalty 
methods (Zhu and Atluri, 1998; Bonet and Kulasegaram, 
2000), (4) perturbed Lagrangian (Chu and Moran, 1995), 
(5) coupling to finite elements (Belytschko, Organ and Kro- 
ngauz, 1995; Huerta and Fernféndez-Méndez, 2000a; Wag- 
ner and Liu, 2001), or (6) modified shape functions (Gosz 
and Liu, 1996; Günter and Liu, 1998; Wagner and Liu, 
2000) among others. 

The first attempts to define shape functions with the 
‘delta property’ along the boundary (see Gosz and Liu, 
1996), namely, N; (Œ) = 8,, for all x; in Tp, have serious 
difficulties for complex domains and for the integration of 
the weak forms. 

In the recent years, mixed interpolations that com- 
bine finite elements with meshfree methods have been 


developed. Mixed interpolations can be quite effective for 
imposing essential boundary conditions. The idea is to use 
one or two layers of finite elements next to the Dirich- 
let boundary and use a meshfree approximation in the 
rest of the domain. Thus, the essential boundary condi- 
tions can be imposed as in standard finite elements. In 
Belytschko, Organ and Krongauz (1995), a mixed interpola- 
tion is defined in the transition area (from the finite elements 
region to the particles region). This mixed interpolation 
requires the substitution of finite element nodes by particles 
and the definition of ramp functions, Thus, the transition is 
of the size of one element and the interpolation is linear. 
Following this idea, Huerta and Fernandez-Méndez (2000a) 
propose a more general mixed interpolation, for any order 
of interpolation with no need for ramp functions and no 
substitution of nodes by particles. This is done preserving 
consistency and continuity of the solution. Figure 8 shows 
an example of this mixed interpolation in 1D: two finite ele- 
ment nodes are considered at the boundary of the domain, 
with their corresponding shape functions with a dashed line, 
and the meshfree shape functions are modified in order to 
preserve consistency, with a solid line. The details of this 
method are presented in Section 6. 


3.3.1 Methods based on a modification of the weak 
form 


For the sake of clarity, the following model problem is 
considered 
—Au=f in Q 
u=Up on Fp (43) 
Vu-n=gy only 


where Py U Ñy = 88, Pp N Üy = Ø and n is the outward 
unit normal on 8&2. The generalization of the following 
developments to other PDEs is straightforward. 


al 


LX 
0 01 02 03 04 o5 sos 07 0.8 1 


Figure 8. Mixed interpolation with linear finite element nodes 
near the boundary and particles in the interior of the domain, with 
p/h = 3.2, cubic spline and linear consistency in all the domain. 
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22A CSI Ce eS 


The weak problem form of (43) is ‘find u € H! (NQ) such 
that u = up on lp and 


fyr vua- f vVu-n ar = f vf an 
Q Tp Q 


+ Í vrit (44) 
ty 


for all v e H! (QY. In the finite element method, the inter- 
polation of u can easily be forced to verify the essential 
boundary condition and the test functions v can be chosen 
such that v = 0 on I’, (see Remark 18), leading to the fol- 
lowing weak form: ‘find u € H!(Q) such that u = up on 
Tp and 


[ve vwan= fof ans f vgy GF (45) 
Q Q Py 


for al ve HiQ, where H}(Q) = {v € HQ) | v= 
0 on Py}. 


Remark 18. In the FEM, or in the context of the contin- 
uous blending method discussed in Section 6, the approxi- 
mation can be written as 


u(x) = Do uN x) + ¥@) (46) 
jB 


where N,(x) denotes the shape functions, W(x) = 

J . . 
Dyes 4G) Nj@), and B is the set of indexes of all 
nodes on the essential boundary, Note that, because of the 
Kronecker delta property of the shape functions for i € B 
and the fact that N, € 13(2) for i ¢ B, the approximation 
defined by (46) verifies u = up at the nodes on the essential 
boundary. Therefore, approximation (46) and v = N,, for 
i ¢ B, can be considered for the discretization of the 
weak form (45). Under these circumstances, the system of 
equations becomes 


Ku=f (47) 
where 


Ky = VN WN, dQ 
fa f Nsan f nwan f Nigy d (48) 
2 Q Tw 


and u is the vector of coefficients u;. 


However, for standard meshfree approximation, the shape 
functions do not verify the Kronecker delta property and 
N, ¢ HIQ) for i ¢ B. Therefore, imposing u = up and 
v =0onI’p is not as straightforward as in finite elements 


or as in the blending method (Belytschko, Organ and 
Krongauz, 1995), and the weak form defined by (45) cannot 
be used. The most popular methods that modify the weak 
form to overcome this problem are the Lagrange multiplier 
method, the penalty method, and Nitsche’s method. 


3.3.2 Lagrange multiplier method 


The solution of problem (43) can also be obtained as 
the solution of a minimization problem with constraints: 
‘u minimizes the energy functional 


nw =5 vo-voar— f uf anf vgy a 
2Ja Q Ty 

(49) 
and verifies the essential boundary conditions.’ That is, 


u = arg min TI (v) (50) 
veH! (Q) 


v=up on Tp 


With the use of a Lagrange multiplier, (x), this mini- 
mization problem can also be written as 


(u, \) = arg min 


max no+ f y(v — up) dr 
veH!(Q) yer" (Tp) Ip 


This min-max problem leads to the following weak 
form with Lagrange multiplier, ‘find u € H' (9R) and à € 
H-"2(T p) such that 


[vvu f vnar= f vf ag 
Q Fp Q 


+ Í vgy dT, Wv € HQ) (51a) 
Ty 
f y(u —up) IT =0, Yye Hp)’ (51b) 
Tp 


Remark 19, Equation (51b) imposes the essential bound- 
ary condition, u = up on Ip, in weak form. 


Remark 20. The physical interpretation of the Lagrange 
multiplier can be seen by simple comparison of equations 
(51a) and (44): the Lagrange multiplier corresponds to the 
flux (traction in a mechanical problem) along the essential 
boundary, } = —Vu-n. 


Considering now the approximation u(x) = Ð; N; @)u; 
with meshfree shape functions N, and an interpolation for 
with a set of boundary functions {N} (œ)} 1> 


A 32 
AO) = DO ANP E) forx €Tp (52) 


i=l 


the discretization of (51) leads to the system of equations 


K A™\/u\_ (/f 
(A ‘o)()-G) 
where K and f are already defined in (48) (use = 0), X 
is the vector of coefficients 4,, and 


Ay = L NEN; dT, b= e Nřuņ dT 


There are several possibilities for the choice of the inter- 
polation space for the Lagrange multiplier ^. Some of 
them are (1) a finite element interpolation on the essen- 
tial boundary, (2) a meshfree approximation on the essential 
boundary, or (3) the same shape functions used in the inter- 
polation of u restricted along Tp, that is, NP = N; for i 
such that N;|, Æ 0. However, the most popular choice is 
the point collocation method, This method corresponds to 
NE(x) = 8(x — xF), where {x#}{_, is a set of points along 
I’, and 8 is the Dirac delta function. In that case, by substi- 
tution of y(x) = 8(x — x#), equation (51b) corresponds to 


u(x) =up(xt), fori=1,...,€ 


That is, A; = NG), b; =up(xP), and each equation 
of Au = b in (53) corresponds to the enforcement of the 
prescribed value at one collocation point, namely, xë. 


Remark 21, The system of equations (53) can also be 
derived from the minimization in R”: of the discrete ver- 
sion of the energy functional (49) subject to the constraints 
corresponding to the essential boundary conditions, Au = 
b. In fact, there is no need to know the weak form with 
Lagrange multiplier, it is sufficient to define the discrete 
energy functional and the restrictions due to the boundary 
conditions in order to determine the system of equations. 


Therefore, the Lagrange multiplier method is, in princi- 
ple, general and easily applicable to all kind of problems. 
However, the main disadvantages of the Lagrange multi- 
plier method are 


1. The dimension of the resulting system of equations is 
increased. 

2. Even for K symmetric and semi-positive definite, the 
global matrix in (53) is symmetric but it is no longer 
positive definite. Therefore, standard linear solvers for 
symmetric and positive definite matrices cannot be 
used. 

3. More crucial is the fact that the system (53) and the 
weak problem (51) induce a saddle-point problem, 
which precludes an arbitrary choice of the interpolation 
space for u and à. The resolution of the multiplier 


Meshfree Methods 295 


à field must be fine enough in order to obtain an 
acceptable solution, but the system of equations will 
be singular if the resolution of the Lagrange multiplier 
X field is too fine. In fact, the interpolation spaces 
for the Lagrange multiplier > and for the principal 
unknown u must verify an inf-sup condition, known 
as the Babuska—Brezzi stability condition, in order 
to ensure the convergence of the approximation (see 
Babuska, 1973a or Brezzi, 1974 for details). 


The first two disadvantages can be neglected in view of 
the versatility and simplicity of the method. However, 
while in the FEM, it is trivial to choose the approxima- 
tion for the Lagrange multiplier in order to verify the 
Babuska—Brezzi stability condition and to impose accurate 
essential boundary conditions, this choice is not trivial for 
meshfree methods. In fact, in meshfree methods, the choice 
of an appropriate interpolation for the Lagrange multiplier 
can be a serious problem in particular situations. 

These properties are observed in the resolution of the 
2D linear elasticity problem represented in Figure 9 where 
the solution obtained with a regular mesh of 30 x 30 
biquadratic finite elements is also shown. The distance 
between particles is h = 1/6 and a finer mesh is used for 
the representation of the solution. 

Figure 10 shows the solution obtained for the Lagrange 
multiplier method, The prescribed displacement is imposed 
at some collocation points at the essential boundary (marked 
with black squares). Three possible distributions for the 
collocation points are considered. In the first one, the col- 
location points correspond to the particles located at the 
essential boundary, The prescribed displacement is exactly 
imposed at the collocation points, but not along the rest 
of the essential boundary. Note that the displacement field 
is not accurate because of the smoothness of the mesh- 
free approximation. But if the number of collocation points 
is too large, the inf-sup condition is no longer verified 
and the system stiffness matrix is singular. This is the 
case of discretization (c), which corresponds to double the 


Figure 9. Problem statement and solution with 30 x 30 biquad- 
ratic finite elements (61 x 61 nodes). 
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{a) 


{b) (o) 


Singular 
matrix 


1 


Figure 10. Solution with Lagrange multipliers for three possible distributions of collocation points (m) and 7 x 7 particles. 


density of collocation points along the essential boundary. 
In this example, the choice of a proper interpolation for 
the Lagrange multiplier is not trivial, Option (b) represents 
a distribution of collocation points that imposes the pre- 
scribed displacements in a correct manner and, at the same 
time, leads to a regular matrix. Similar results are obtained 
if the Lagrange multiplier is interpolated with boundary 
linear finite elements (see Fernaéndez-Méndez and Huerta, 
2004). 

Therefore, although imposing boundary constraints is 
straightforward with the Lagrange multiplier method, the 
applicability of this method in particular cases is impaired 
due to the difficulty in the selection of a proper interpolation 
space for the Lagrange multiplier. It is important to note 
that the choice of the interpolation space can be even more 
complicated for an irregular distribution of particles (see 
Chapter 9, this Volume). 


3.3.3 Penalty method 


The minimization problem with constraints defined by (50) 
can also be solved with the use of a penalty parameter. 
That is, 


1 
=arg min M = —upy ar (54 
sie Fe Tu of © Up) 4) 
The penalty parameter B is a positive scalar constant that 
must be large enough to accurately impose the essential 
boundary condition. The minimization problem (54) leads 
to the following weak form: ‘find u € H!(Q) such that 


[ve yuaga f vu ar = f vf ag 
Q To Q 


+f V8y ar+e f vup dI (55) 
Tw Fp 


for all v € H (QY. The discretization of this weak form 
leads to the system of equations 


(K + BM?)u = f + Bf? (56) 


where K and f are defined in (48) (use y = 0) and 


m-f N,N, dr, pef N,up a 
D Fp 


Remark 22. The penalty method can also be obtained 
from the minimization of the discrete version of the energy 
functional in R”f, subjected to the constraints correspond- 
ing to the essential boundary condition Au = b. 


Like the Lagrange multiplier method, the penalty method 
is easily applicable to a wide range of problems. The 
penalty method presents two clear advantages: (1) the 
dimension of the system is not increased and (2) the matrix 
in the resulting system (see equation (56)) is symmetric 
and positive definite, provided that K is symmetric and B 
is large enough. 

However, the penalty method also has two important 
drawbacks: the Dirichlet boundary condition is weakly 
imposed (the parameter $ controls how well the essential 
boundary condition is met) and the matrix in (56) is often 
poorly conditioned (the condition number increases with 8). 

A general theorem on the convergence of the penalty 
method and the choice of the penalty parameter B can 
be found in Babuska (1973b) and Babuska, Banerjee and 
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Osborn (2002b). For an interpolation with consistency of 
order p and discretization measure h (i.e. the character- 
istic element size in finite elements or the characteristic 
distance between particles in a meshfree method), the best 
error estimate obtained by Babuska (1973b) gives a rate of 
convergence of order h(@?+)/5)) in the energy norm, pro- 
vided that the penalty $ is taken to be of order h~(@P+)/3), 
In the linear case, it corresponds to the optimal rate of con- 
vergence in the energy norm. For order p > 2, the lack 
of optimality in the rate of convergence is a direct conse- 
quence of the lack of consistency of the weak formulation 
(see Arnold et al., 2001/02 and Remark 23). 

These properties can be observed in the following 2D 
Laplace problem: 


Au=0 
u(x, 0) = sin(nx) 
u(x, 1) =u(0, y) =u(1, y) =0 


(x, y) €10, 1[x]0, 1[ 


with analytical solution (see Wagner and Liu, 2000), 
u(x, y) = [cosh(xy) — coth(x y) sinh(xy)] sin(x) 


A distribution of 7 x 7 particles is considered, that is, the 
distance between particles is h = 1/6. 

Figure 11 shows the solution for increasing values of the 
penalty parameter $. The penalty parameter must be large 
enough, $ > 103, in order to impose the boundary condition 
in an accurate manner. Figure 12 shows convergence curves 
for different choices of the penalty parameter. The penalty 
method converges with a rate close to 2 in the £? norm 
if the penalty parameter $ is proportional to A~?. If the 
penalty parameter is constant, or proportional to ho, the 


2 
OISIN 


x AN p un 
att 
SOUZA) 


boundary error dominates and the optimal convergence rate 
is lost as h goes to zero. 

Figure 12 also shows the matrix condition number for 
increasing values of the penalty parameter, for a distribution 
of 11 x 11 and 21 x 21 particles. The condition number 
grows linearly with the penalty parameter, Note that, for 
instance, for a discretization of 21 x 21 particles, a rea- 
sonable value for the penalty parameter is B = 10°, which 
corresponds to a condition number near 10!2, Obviously, 
the situation gets worse for denser discretizations, which 
need Jarger penalty parameters. The ill-conditioning of the 
matrix reduces the applicability of the penalty method. 


3.3.4 Nitsche’s method 
Nitsche’s weak form for problem (43) is 


[yevu] vvu-n ar = f Vv- nu dI 
Q Tp Tp 


+6 [ vu ar = f uf a+ f vgy dF 
Tp Q Py 


— Vo-nup ar +6 f vup AT (57) 
Fp To 

where $ is a positive constant scalar parameter (see Arnold 
et al., 2001/02; Nitsche, 1970). 

Comparing with the weak form defined by (44), the new 
terms in the Lh.s. of (57) are fp, uVu-n dr, which recov- 
ers the symmetry of the bilinear form, and B fp, vu dF, 
which ensures the coercivity of the bilinear form (i.e. 
the matrix corresponding to its discretization is positive 
definite), provided that B is large enough. The new terms 


Figure 11. Penalty method solution (top) and error (bottom) for $ = 10 (left), B = 100 (center) and B = 10° (right). 
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Figure 12. Evolution of the £2(2) error norm for the penalty method and matrix condition number. 


in the r.h.s. are added to ensure consistency of the weak 
form. 

The discretization of the Nitsche’s weak form leads to 
a system of equations with the same size as K and whose 
system matrix is symmetric and positive definite, provided 
that K is symmetric and ĝ is large enough. Although, as 
in the penalty method, the condition number of this matrix 
increases with parameter f, in practice not very large values 
are needed in order to ensure convergence and a proper 
implementation of the boundary condition, 


Remark 23. Nitsche’s method can be interpreted as a 
consistent improvement of the penalty method. The penalty 
weak form (55) is not consistent, in the sense that the 
solution of (43) does not verify the penalty weak form 
for trial test functions that do not vanish at Tp (see 
Arnold et al., 2001/02). Nitsche’s weak form keeps the term 
Jr, vVu-n dT from the consistent weak form (44) and 
includes new terms maintaining the consistency. 


The only problem of Nitsche’s method is the deduction 
of the weak form. The generalization of the implementation 
for other problems is not as straightforward as for the 
method of Lagrange multipliers or for the penalty method. 
The weak form and the choice of the parameter B depends 
not only on the partial differential equation, but also on 
the essential boundary condition to be prescribed. Nitsche’s 
method applied to other problems is discussed by Nitsche 
(1970), by Becker (2002) for the Navier-Stokes problem, 
by Freund and Stenberg (1995) for the Stokes problem, by 
Hansbo and Larson (2002) for elasticity problems. 

Regarding the choice of the parameter, Nitsche proved 
that if B is taken as B = a/h, where a is a large enough 
constant and h denotes the characteristic discretization mea- 
sure, then the discrete solution converges to the exact 
solution with optimal order in H! and £? norms. Moreover, 


for model problem (43) with Dirichlet boundary conditions, 
Tp = 98, a value for constant a can be determined tak- 
ing into account that convergence is ensured if B > 2¢?, 
where C is a positive constant such that ||Vv -n leza) S 
CHV ea) for all v in the chosen interpolation space. This 
condition ensures the coercivity of the bilinear form in the 
interpolation space. Griebel and Schweitzer (2000) propose 
the estimation of the constant C as the maximum eigenvalue 
of the generalized eigenvalue problem, 


Av = By (58) 
where 
Ay = [oon -n)(VN; +n) dT, 
By = | YN, VN, dQ 


The problem described in Figure 9 can be solved by 
Nitsche’s method for different values of B (see Figure 13). 
Note that the modification of the weak form is not trivial 
in this case (see Fern4ndez-Méndez and Huerta, 2004). 
The major advantage of Nitsche’s method is that scalar 
parameter ĵ need not be as large as in the penalty method, 
and avoids the need to meet the Babuska-Brezzi condition 
for the interpolation space for the Lagrange multiplier. 


3.4 Incompressibility and volumetric locking in 
meshfree methods 


Locking in finite elements has been a major concern since 
its early developments; it is of particular concern for non- 
linear materials (Belytschko, Liu and Moran, 2000). Tt 
appears because poor numerical interpolation leads to an 
overconstrained system. Locking of standard finite elements 
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Figure 13. Nitsche’s solution, 7 x 7 distribution of particles, for 
6 = 100 (a) and B = 104 (b). 


has been extensively studied, It is well known that bilinear 
finite elements lock in some problems and that biquadratic 
elements have a better behavior (Hughes, 2000). Moreover, 
locking has also been studied for increasing polynomial 
degrees in the context of an hp adaptive strategy (see Suri, 
1996). 

For instance, let’s consider a linear elastic isotropic mate- 
rial under plane strain conditions and small deformations, 
namely, Vŝu, where u is the displacement and V5 the 
symmetric gradient, that is, V = (1/2)(V? + V). Dirich- 
let boundary conditions are imposed on T p: 4 traction h is 
prescribed along the Neumann boundary Ty and there is 
a body force f. Thus, the problem that needs to be solved 
may be stated as: solve for u € [Hj]? such that 


E 
1+v 


4 V8u: Vu dQ 
Q 


Ev 


eer A dQ 


= f fov ags hevdl Woe [Hir] (59) 
Q Ty 


In this equation, the standard vector subspaces of 
H! are employed for the solution u, TAL? = 
fu €[H'? |u =up on Tp} (Dirichlet conditions, up, 
are automatically satisfied) and for the test functions v, 
[Hir P = fv € [HF |v =0 only}, (zero values are 
imposed along Tp). 

This equation, as discussed by Suri (1996), shows the 
inherent difficulties of the incompressible limit. The stan- 
dard a priori error estimate emanating from (59) and based 
on the energy norm may become unbounded for values 
of v close to 0.5. In fact, in order to have finite values 
of the energy norm, the divergence-free condition must 
be enforced in the continuum case, that is, V - u = 0 for 
u € [Hp,]°, and also in the finite dimensional approxima- 
tion space. In fact, locking will occur when the approxi- 
mation space is not rich enough for the approximation to 
verify the divergence-free condition. 

In fluids, incompressibility is directly the concern. Accu- 
rate and efficient modelling of incompressible flows is an 


important issue in finite elements. The continuity equation 
for an incompressible fluid takes a peculiar form. It consists 
of a constraint on the velocity field that must be divergence- 
free, Then, the pressure has to be considered as a variable 
not related to any constitutive equation. Its presence in 
the momentum equation has the purpose of introducing an 
additional degree of freedom needed to satisfy the incom- 
pressibility constraint. The role of the pressure variable is 
thus to adjust itself instantaneously in order to satisfy the 
condition of divergence-free velocity, That is, the pressure 
is acting as a Lagrange multiplier of the incompressibility 
constraint and thus there is a coupling between the velocity 
and the pressure unknowns, 

Various formulations have been proposed for incompress- 
ible flow (Girault and Raviart, 1986; Gresho and Sani, 
2000; Gunzburger, 1989; Pironneau, 1989; Quartapelle, 
1993; Quarteroni and Valli, 1994; Temam, 2001; Donea and 
Huerta, 2003). Mixed finite elements present numerical dif- 
ficulties caused by the saddle-point nature of the resulting 
variational problem. Solvability of the problem depends on 
a proper choice of finite element spaces for velocity and 
pressure. They must satisfy a compatibility condition, the 
so-called Ladyzhenskaya-Babuska-Brezzi (LBB or inf-sup) 
condition. If this is not the case, alternative formulations 
(usually depending on a numerical parameter) are devised 
to circumvent the LBB condition and enable the use of 
velocity—pressure pairs that are unstable in the standard 
Galerkin formulation. 

Incompressibility in meshfree methods is still an open 
topic. Even recently, it was claimed (Belytschko, Lu and 
Gu, 1994; Zhu and Atluri, 1998) that meshless methods do 
not exhibit volumetric locking. Now it is clear that this is 
not true, as shown in a study of the element-free Galerkin 
(EFG) method by Dolbow and Belytschko (1999). More- 
over, several authors claim that by increasing the dilation 
parameter, locking phenomena in meshfree methods can be 
suppressed or at least attenuated (Askes, Borst and Heeres, 
1999; Dolbow and Belytschko, 1999; Chen et al., 2000). 
Huerta and Fernandez-Méndez (2001) clarify this issue and 
determine the influence of the dilation parameter on the 
locking behavior of EFG near the incompressible limit by 
a modal analysis. The major conclusions are 


1. The number of nonphysical locking modes is indepen- 
dent of the ratio p/h. 

2. An increase of the dilation parameter decreases the 
eigenvalue (amount of energy) in the locking mode 
and attenuates, but does not suppress volumetric lock- 
ing (in the incompressible limit the energy will remain 
unbounded), 

3. An increase in the order of consistency decreases the 
number of nonphysical locking modes. 
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4. The decrease in the number of nonphysical locking 
modes is slower than in finite elements. Thus EFG 
will not improve the properties of the FEM (from 
a volumetric locking viewpoint) for p or h-p refine- 
ment. However, for practical purposes and as in finite 
elements, in EFG an h-p strategy will also suppress 
locking. 

The remedies proposed in the literature are, in general, 
extensions of the methods developed for finite elements. For 
instance, Dolbow and Belytschko (1999) propose an EFG 
formulation using selective reduced integration. Chen et al. 
(2000) suggest an improved RKPM based on a pressure 
projection method. These alternatives have the same advan- 
tages and inconveniences as in standard FEMs. Perhaps it 
is worth noting that in meshfree methods it is nontrivial to 
verify analytically the LBB condition for a given approxi- 
mation of velocity and pressure. 

One alternative that uses the inherent properties of mesh- 
free methods and does not have a counterpart in finite ele- 
ments is the pseudo-divergence-free approach (Vidal, Villon 
and Huerta, 2002; Huerta, Vidal and Villon, 2004b). This 
method is based on diffuse derivatives (see Section 2.2.6), 
which converge to the derivatives of the exact solution 
when the radius of the support, p, goes to zero (for a fixed 
ratio p/h). One of the key advantages of this approach is 
that the expressions of pseudo-divergence-free interpolation 
functions are computed a priori, that is, prior to determin- 
ing the specific particle distribution. Thus, there is no extra 
computational cost because only the interpolating polyno- 
mials are modified compared with standard EFG. 


4 RADIAL BASIS FUNCTIONS 


Radial basis functions (RBF) have been studied in math- 
ematics for the past 30 years and are closely related to 
meshfree approximations. There are two major differences 
between the current practice approaches in radial basis func- 
tions and those described in previous sections: 


1. Most radial basis functions have noncompact support. 
2, Completeness is provided by adding a global polyno- 
mial to the basis. 


For these two reasons, the solution procedures usually have 
to be tailored for this class of global interpolants. Two tech- 
niques that avoid the drawbacks of global approximations 
are 


1. Multipolar methods 
2. Domain decomposition techniques 


The most commonly used radial basis functions (with their 
names) are 


lx —x,|| =r linear 


r?logr thin plate spline 
PO) = j e Gaussian 
(+R) multipolar 


where c, R, and q are shape parameters. The choice of these 
shape parameters has been studied by Kansa and Carlsson 
(1992), Carlsson and Foley (1991), and Rippa (1999), 

Completeness of the radial basis function approximations 
is usually provided by adding a global polynomial to 
the approximation. For example, an approximation with 
quadratic completeness is 


Nsd Msd Msd 
ux) = co + Vig x; + yay XxX; + Xa, ®,(x) 
i=] i=l j=l 1 


where cg, cj, dij, and a, are the unknown parameters. 

One of the major applications of radial basis functions 
has been in data fitting. The reference by Carr, Fright and 
Beatson (1997), where millions of data points are fit, is 
illustrative of the power of this method. One of the first 
applications to the solution of PDEs is given by Kansa 
(1990), who used multiquadrics for smooth problems m 
fluid dynamics. In Sharan, Kansa and Gupta (1997), the 
method was applied to elliptic PDE’s. In both cases, col- 
location was employed for the discretization. Exceptional 
accuracy was reported. Although a good understanding of 
this behavior is not yet available, evidently very smooth 
global approximants have intrinsic advantages over rough 
approximants for elliptic PDEs and other smooth prob- 
lems (any locally supported approximant will have some 
roughness at the edge of its support). The low cost of 
RBF evaluation is another advantage. Wendland (1999) 
has studied Galerkin discretization of PDEs with radial 
basis functions. Compactly supported RBFs are also under 
development. Local error estimates for radial basis approx- 
imations of scattered data are given by Wu and Schaback 
(1993). 


5 DISCONTINUITIES 


One of the most attractive attributes of meshfree methods 
is their effectiveness in the treatment of discontinuities. 
This feature is particularly useful in solid mechanics in 
modelling cracks and shear bands. 

The earliest methods for constructing discontinuous 
meshfree approximations were based on the visibility 
criterion (Organ etal., 1996). In this method, in 
constructing the approximation at x, the nodes on the 
opposite side of the discontinuity are excluded from the 
index set Sf, that is, nodes on the opposite side of the 


discontinuity do not affect the approximation at x. To be 
more precise, if the discontinuity is described implicitly (i.e. 
by a level set) by f(x) =0, with f(x) > 0 on one side, 
fŒ) <0 on the other, then 


fee f@)>O>TeS?, 
fæ) fx) <051¢S! 


The name ‘visibility’ criterion originates from the notion 
of considering a discontinuity as an opaque surface while 
choosing the nodes that influence the approximation at a 
point x, Sf. If a node x q is invisible from x, then node J is 
not included in the index set S? even when it falls within 
the domain of influence. 

When a discontinuity ends within a domain, such as at 
a cracktip, the visibility criterion does not provide an ade- 
quate tool for constructing the discontinuity in the approx- 
imation. Around the cracktip the approximation must be 
constructed so that it is continuous in front of the crack- 
tip but discontinuous behind it. One way to accomplish 
this is to include in the basis branch functions of the 
form r 

By = [? sin > r? sin s] (60) 
where 9 is the angle between the line to the cracktip and 
the tangent to the crack and r is the distance to the cracktip 
(see Figure 14(b)). 

For cracks in elastic materials, the basis can be enriched 
by branch functions that span the asymptotic neartip field 
of the Westergaard solution (see Fleming etal., 1997). 
The basis is then the polynomial basis plus the func- 
tions 


B= iran’, ieee sin 6, Fs, 
2 2 2 
8 
Vrcos 5 cos6 | (61) 


Discontinuity 
f=0 


(a) 


Figure 14, (a) Level set describing discontinuity and (b) nomen- 
clature for cracktip branch functions. 
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Since this basis includes the first-order terms in the neartip 
field solution, very accurate solutions can be obtained with 
coarse discretizations. Xiao and Karihaloo (2003) have 
shown that even better accuracy can be attained by adding 
higher-order terms in the asymptotic field. The idea of 
incorporating special functions can also be found, among 
others, in Oden and Duarte (1997). 

Regardiess of whether a branch function is used near 
the cracktip, if the visibility criterion is used for construct- 
ing the approximation near the end of the discontinuity, 
additional discontinuities may occur around the tip. Two 
such discontinuities are shown in Figure 15. They can be 
avoided if the visibility criterion is not used near the tip. 
Krysl and Belytschko (1997) have shown that these dis- 
continuities do not impair the convergence of the solution 
since their length, and hence the energy of the discontinuity, 
decreases with h. However, these extraneous discontinuities 
complicate the integration of the weak form, so several 
techniques have been developed to smooth the approxima- 
tion around the tip of a discontinuity; the diffraction and 
the transparency method. 

In the transparency method, the crack near the crack tip is 
made transparent. The degree of transparency is related to 
the distance from the crack tip to the point of intersection. 
The shape functions are smoothed around the crack tip as 
shown in Figure 16. The diffraction method is similar to 
the transparency method. The shape function is smoothed 
similar to the way light diffracts around a corner (see Organ 
et al., 1996), 

The recommended technique for including discontinuities 
is based on the partition of unity property of the approxi- 
mants. It was first developed in the context of the extended 
finite element method (see Moes, Dolbow and Belytschko, 
1999; Belytschko and Black, 1999; and Dolbow, Moes 
and Belytschko, 2000). It was applied in EFG by Ventura, 
Xu and Belytschko (2002). Let the set of particles whose 
support is intersected by the discontinuity be denoted by 
SP, and the set of particles whose support includes the 


Inter 
discontinuities 


Figure 15. Inter discontinuity in visibility criterion method. 
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Crack 


Figure 16. Domain of influence of node J by the transparency 
method near the crack tip. 


tip of the discontinuity be S”. The approximation is then 
given by 


uc) = SN, @) a, + D> Ne) HE) aP 


Ies? Ies2NAS? 
+ E mo (8, ap) 
TesTnsf J 


where H(-) is the step function (Heaviside function). It 
is easy to show that the weak form (see Belytschko and 
Black, 1999) then yields the requisite strong form for elastic 
problems; this is also true for partial differential equations 
describing other physical problems. 

The technique can also be used for intersecting and 
branching discontinuities (see Daux etal., 2000; and 
Belytschko etal., 2001). For example, consider the 
geometry shown in Figure 17. Let J! be the set of 
nodes whose domains of influence include the discontinuity 
f,@) = 0, J? the corresponding set for f(r) = 0. Let 


h(x) =0 £,(x) =0 


f(x) =0 
(a) (b) 


h(x) =0 


Figure 17. Support of node 7 with (a) intersecting discontinuities 
and (b) branching discontinuities. 


J? = J! N J?. Then the approximation is 
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5.1 Discontinuities in gradients 


The continuously differentiable character of meshfree 
approximation functions is sometimes a disadvantage. At 
material interfaces in continua, or more generally, at 
discontinuities of the coefficients of a PDE, solutions of 
elliptic and parabolic systems have discontinuous gradients. 
These discontinuities in the gradient need to be explicitly 
incorporated in meshfree methods. £ 

One of the first treatments of discontinuous gradients was 
by Cordes and Moran (1996). They treated the construction 
of the approximation separately; that is, they subdivided 
the domain into two subdomains. Let the subdomains 9, 
and Q, be with interface I". They enforced continuity 
of the function along T! by Lagrange multipliers. Since 
the approximation u(Q,) was constructed without consid- 
ering any nodes in Q, and vice versa, the gradient of the 
approximation will be discontinuous across I" after the 
continuity of the approximation is enforced by Lagrange 
multipliers. As in the case of essential boundary condi- 
tions, this continuity of the function can also be enforced 
by penalty methods, augmented Lagrangian methods or 
Nitsche’s method. 

An alternative technique was proposed by Krongauz and 
Belytschko (1998a), who added a function with a discon- 
tinuous gradient, In the original paper, the approximation 
is enriched with the absolute values of a signed distance 
function, 

This enrichment function can also be employed with a 
local or global partition of unity 


u(x) = DIN, uy + Ny IFC) ar 
I 


This was first studied in a finite element context by Suku- 
mar et al. (2001) and Belytschko et al. (2001). In a finite 
element, the local partition of unity has some difficulties 
because the enrichment function does not decay, so it is dif- 
ficult to fade it out gracefully. A global partition of unity is 
also undesirable. This behavior of this enrichment in mesh- 
free methods has not been studied. 


The local partition of unity method can also be used 
for intersecting gradient discontinuities, branching gradi- 
ent discontinuities, and unclosed gradient discontinuities. 
The techniques are identical to those for discontinuities in 
functions, except that the step function is replaced by the 
absolute values of the signed distance function. For exam- 
ple, the approximation for a function with the intersecting 
discontinuities shown in Figure 17(a) is given by 


uh) = S Nu Y Nwa 


Tes? Tesins? 
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The enrichment procedures are substantially simpler to 
implement than the domain subdivision/constraint tech- 
nique of Cordes and Moran (1996), particularly for complex 
patterns such as intersecting and branching gradient discon- 
tinuities. 


6 BLENDING MESHFREE METHODS 
AND FINITE ELEMENTS 


Several authors have proposed different alternatives to 
blend finite elements and meshfree methods. These approx- 
imations are combined for two purposes: either i) to couple 
the two approximations, or ii) to enrich the finite element 
interpolation using particles. 

In the first scenario, the objective is to benefit from the 
advantages of each interpolation in different regions of the 
computational domain, which is usually divided in three 
regions (see Figure 18). In one region, only finite elements 
are present, another with only meshfree approximation 
functions, and a transition region. s 


Figure 18. Discretization for the coupling of finite elements and 
a meshfree method: finite element nodes (e), particles (x) and 
transition region (in gray). 
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In the second case, the enrichment improves the approxi- 
mation without remeshing. A finite element interpolation is 
considered in the whole domain and particles can be added 
in order to improve the interpolation in a selected region. 

Coupled finite elements/meshfree methods are proposed 
by Belytschko, Organ and Krongauz (1995). They show 
how to couple finite elements near the Dirichlet bound- 
aries and EFG in the interior of the computational domain. 
This simplifies considerably the prescription of essential 
boundary conditions. The mixed interpolation in the transi- 
tion region requires the substitution of finite element nodes 
by particles and the definition of ramp functions. Thus the 
region for transition is of the size of one finite element (as 
in Figure 18) and the interpolation is linear. With the same 
objectives Hegen (1996) couples the finite element domain 
and the meshfree region with Lagrange multipliers. 

Liu, Uras and Chen (1997b) independently suggest to 
enrich the finite element approximation with particle meth- 
ods. In fact, the following adaptive process seems attractive: 
(1) compute an approximation with a coarse finite ele- 
ment mesh, (2) do an a posteriori error estimation and (3) 
improve the solution with particles without any remeshing 
process. 

Following these ideas, Huerta and Ferndéndez-Méndez 
(2000a) propose a unified and general formulation: the 
continuous blending method. This formulation allows both 
coupling and enrichment of finite elements with meshfree 
methods, The continuous blending method generalizes the 
previous ideas for any order of approximation, suppresses 
the ramp functions, and it does not require the substitution 
of nodes by particles. That is, as many particles as needed 
can be added where they are needed, independently of the 
adjacent finite element mesh. This is done in a hierarchical 
manner, This approach has been generalized in Chen et al. 
(2003) to get a nodal interpolation property. 

Other alternatives are also possible; for instance, the 
bridging scale method proposed by Wagner and Liu (2000) 
is a general technique to mix a meshfree approximation 
with any other interpolation space, in particular with finite 
elements, Huerta, Ferndndez-Méndez and Liu (2004a) com- 
pare the continuous blending method and the bridging scale 
method, for the implementation of essential boundary con- 
ditions. The bridging scale method does not vanish between 
the nodes along the essential boundary. As noted by Wag- 
ner and Liu (2000), a modified weak form must be used 
to impose the essential boundary condition and avoid a 
decrease in the rate of convergence. 

Next section is devoted to the continuous blending 
method proposed by Huerta and Ferndndez-Méndez 
(2000a). This method allows to recall the basic concepts 
on both enrichment and coupling. Although all the 
developments are done for the EFG method, the continuous 
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blending method is easily generalizable to other meshfree 
methods based on an MLS approximation. 


6.1 Continuous blending method 


Huerta and coworkers (Huerta and Ferndéndez-Méndez, 
2000a; Ferndndez-Méndez and Huerta, 2002; Fernández- 
Méndez, Diez and Huerta, 2003) propose a continuous 
blending of EFG and FEMs, 


u(x) = (x) = X, Ni) uj + YAP) A 
JeF ieZ 
= Thu + Y NP Ce) u, (62) 
fel 


where the finite element shape functions {nh} jeg are as 
usual, and the meshfree shape functions WNP her take care 
of the required consistency of the approximation, that is, 
consistency of order m. TT? denotes the projection oper- 
ator onto the finite element space. Figure 19 presents two 
examples for the discretization. Particles are marked with 
x and active nodes {x j } jer for the functional interpolation, 
are marked with e, Other nonactive nodes are considered 
to define the support of the finite element shape functions 
(thus only associated to the geometrical interpolation). The 
first discretization (left) can be used for the coupling situ- 
ation. Note that with the continuous blending method, the 
particles can be located where they are needed indepen- 
dently of the adjacent finite element mesh. In the second 
one (right), a finite element mesh is considered in the whole 


Ogg 


Figure 20. Shape functions of the coupled finite element ( 


KS 


domain and particles are added to enrich the interpolation, 
and increase the order of consistency, in the gray region in 
Figure 19, 

The meshfree shape functions required in (62) are defined 
as in standard EFG (see equation (28)) 


NP@) = of, x) PEE) (63) 


but now the unknown vector & is determined by imposing 
the reproducibility condition associated to the combined 
EFG and finite element approximation, that is, 


Pæ) = TP) + $ N.E) PQ) (64) 
ieZ 


Substitution of (63) in (64) leads to a small system of 
equations for ¥ € R'+! (see Huerta and Fernaéndez-Méndez, 
2000a for details) 


M(x) ax) = P(x) — TP) (65) 


The only difference with standard EFG is the modifica- 
tion of the r.h.s. of the previous system, in order to take 
into account the contribution of the finite element base 
in the approximation. Moreover, note that the expression 
for the modified EFG shape functions is independent of 
the situation, that is, the same implementation is valid for 
enrichment and coupling, increasing the versatility of this 
approach. 

Figure 20 shows a 1D example of coupling finite ele- 
ments and EFG. The meshfree shape functions adapt their 
shape to recover the linear interpolation. In the coupling 


vo yE M 


$3 PRAY FS 
POSE SE Ar Ok 


) and meshfree (-------) interpolation. 
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Figure 21. Discretization with active finite element nodes at the boundary (#) and particles (x), and meshfree shape function associated 


to the particle located at the gray circle (A) and (B) respectively. 


situation, the continuity of the approximation is ensured 
under some conditions, even in multiple dimensions, by the 
following proposition (see Fernández-Méndez and Huerta, 
2002, 2004 for the proof). 


Proposition 3. The approximation u(x), defined in (62), 
is continuous in Q if (1) the same order of consistency m 
is imposed all over Q (i.e. m coincides with the degree of 
the FE base), and (2) the domain of influence of particles 
coincides exactly with the region where finite elements do 
not have a complete basis. 


Remark 24 (imposing Dirichlet boundary conditions) 
Under the assumptions (J) and (2) in Proposition 3, the con- 
tribution of the particles is zero in the regions, where the 
finite element base is complete. In particular, this means 
that NP = Q in the finite element edges (or faces in 3D) 
whose nodes are all in J (active nodes). This is an impor- 
tant property for the implementation of essential boundary 
conditions. If a finite element mesh with active nodes at 
the essential boundary is used, the meshfree shape func- 
tions take care of reproducing polynomials up to degree 
m in Q and, at the same time, vanish at the essential 
boundary. For instance, Figure 21 shows the particle shape 
functions, NP = 0, associated to particles at the boundary 
and in the first interior layer. Note that they vanish along 
the boundary because the finite element base is complete 
on 3N. Therefore, the prescribed values can be directly 
imposed as usual in the framework of finite elements, by 
setting the values of the corresponding nodal coefficients. 
Moreover, it is also easy to impose that the test functions 
(for the weak forms) vanish along the Dirichlet bound- 
ary (see Fern4ndez-Méndez and Huerta, 2004 for further 
details). 


The problem described in Figure 9 can be solved using 
the continuous blending method, Figure 22 shows the solu- 
tion. As observed in Remark 24, the prescribed displace- 
ments are directly imposed. Two different finite element 
discretizations are considered. In both cases, the linear finite 


Figure 22. Continuous blending for two different distributions 
of finite elements near the essential boundary and the same 
distribution of particles, h = 1/6. 


tidy 


50 mm 


E=2:10" Pa 
v=0.33 

õp = 2. 108 Pa 
E,=2. 10° Pa 


Figure 23. Problem statement: rectangular specimen with one 
centered imperfection. 


element approximation at the boundary, allows the exact 
enforcement of the prescribed displacement. Note that if 
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(b) 


Figure 24. Coarse finite element mesh (Q1 elements) with its corresponding equivalent inelastic strain (b) and mixed interpolation with 
its equivalent inelastic strain distribution (a). A color version of this image is available at http://www mrw.interscience.wiley.com/ecm 


the prescribed displacement is piecewise linear or piece- 
wise constant, as it is in this example, then it is imposed 
exactly when a bilinear finite element approximation is 
used. 

Finally, an example reproduces the finite element enrich- 
ment with EFG in a nonlinear computational problem. A 
rectangular specimen with an imperfection is loaded (see 
Diez, Arroyo and Huerta, 2000; Huerta and Diez, 2000). 
Figure 23 presents the problem statement with the material 
properties. 

This problem has been solved with the element-free 
Galerkin method. A coarse mesh solution of quadrilateral 
bilinear finite elements (308 dof) is shown in Figure 24 
(left), When particles are added (308 + 906 = 1214 dof) 
and the order of consistency is increased (m = 2), an 
accurate distribution of inelastic strains is recovered (see 
Figure 24). ; 

In this example, Figure 24, the original mesh is main- 
tained and particles are added where they are needed, 
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1 INTRODUCTION 


Discrete element methods comprise a set of computational 
modeling techniques suitable for the simulation of dynamic 
behavior of a collection of multiple rigid or deformable 
bodies, particles or domains of arbitrary shape, subject 
to continuously varying contact constraints. Bodies collide 
with one another, new contacts are established, while old 
contacts may be released, giving rise to changes in the 
contact status and contact interaction forces, which in turn 
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influence the subsequent movement of bodies. Therefore, 
issues related to the nonsmoothness in space (separate bod- 
ies) and in time (jumps in velocities upon collisions) need 
to be considered, as well the application of the interaction 
law (e.g. nonpenetrability, friction). Typically, a configura- 
tion of the collection of bodies changes continuously under 
the action of some external agency and as a result of the 
interaction law between bodies, leading to a steady state 
configuration at the state of rest, if a static equilibrium is 
reached. Bodies can be considered as rigid or deformable — 
if they are rigid, the interaction law between bodies in con- 
tact is the only constitutive law considered, whereas for 
deformable bodies, an appropriate homogenized continuum 
constitutive law (e.g. elasticity, plasticity, fracturing) needs 
to be accounted for as well. 

Computational modeling of multibody contacts (contact 
detection and resolution) is clearly the dominant issue in 
discrete element methods, although the same issue appears 
in the nonlinear finite element analyses of contact problems 
(see Chapter 6, Volume 2). When the number of bodies 
or domains in contact is relatively small, it is possible a 
priori to define groups of nodes, segments, or surfaces, 
which belong to a possible contact set. These geometric 
attributes are then continuously checked against one another 
and the kinematic resolution can be treated in a very 
rigorous manner. Bodies that are possibly in contact may 
be internally discretized by finite elements (Figure 1), and 
their material behavior can essentially be of any complexity. 

Discrete element methods are specifically geared for sim- 
ulations involving a large number of bodies and emphasis 
lies on the change of contact locations and conditions that 
cannot be defined a priori and that need to be continu- 
ously updated as the solution progresses. Discrete element 
methods also represent powerful frameworks in which 
very simple interaction laws between individual particles 
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Figure 1. Coliection of bodies, discretization of bodies into finite elements, changing configurations, and possible fragmentation. 


can simulate the complex material behavior observed at a 
homogenized, macroscopic level. 

The term discrete element method is most commonly 
associated with the definition (Cundall, 1989) that refers 
to any computational modeling framework, which 


1. allows finite displacements and rotations of discrete 
bodies, including complete detachment and 

2. recognizes new contacts automatically as the calcula- 
tion progresses. 


There exist a large number of methods or methodologies 
or procedures, which in one way or another belong to a 
broad class of discrete element methods. They are often 
referred to or could possibly be classified and distinguished 
according to the manner they deal with (a) contact detection 
algorithm, (b) treatment of contacts (rigid, deformable), 
(c) deformability and material model of bodies in contact 
(rigid, deformable, elastic, elasto-plastic etc.), (d) small- 
strain or large-strain formulations, (e) number (small or 
large) and distribution (loose or dense packing) of inter- 
acting bodies considered, (f) consideration of the model 
boundaries, (g) possible fracturing or fragmentation, and 
(h) time stepping integration schemes (explicit, implicit), 
Many methods differ in only a small number of the above 
attributes, yet they appear under different names. More- 
over, the term discrete element methods is not used only 
in the case of preexisting discontinuities, as the discrete 
nature of the emerging discontinuities is often also taken 
into account. 

Discontinuous modeling frameworks, which are increas- 
ingly utilized in modeling discontinuous, fractured and 
disjointed media also include techniques appearing under 
various names such as the rigid block spring method, 
discontinuous deformation analysis, combined discrete/ 
finite elements, nonsmooth contact dynamics, and so 
on. Their applications (Figure 2) range from modeling 
problems of an inherently discontinuous behavior (granular 


and particulate materials, silo flow, sediment transport, 
jointed rocks, stone or brick masonry) to problems in 
which the modeling of transition from a continuum to 
a discontinuum is more important. Increased complexity 
of different discontinuous models is achieved by incor- 
porating the deformability of solid material and/or by 
more complex contact interaction laws and by the intro- 
duction of some failure or fracturing criteria controlling 
the solid material behavior and the emergence of new 
discontinuities. 

On a homogenized continuum level, complex nonlinear 
continuum finite element analyses are conducted using 
inelastic material models and including, if appropriate, joint 
and interface elements to model any planes of weaknesses 
(see Chapter 10, Volume 2). Typically, if a small number 
of discontinuities needs to be considered, these interface 
elements are adopted within an overall nonlinear continuum 
modeling framework ~ on the other hand, if the number of 
discontinuities is very large, some form of homogenization 
is usually employed (see Chapter 12, Volume 2). 

Most media can be treated as discontinuous at some 
level of observation (nano, micro, meso, macro), where 
the continuum assumptions cease to apply. This happens 
when the scale of the problem becomes similar to the 
characteristic length scale of the associated material struc- 
ture and the interaction laws between bodies or particles 
are invoked instead of the continuum constitutive law. 
This chapter is concerned with the discontinuous model- 
ing of interaction phenomena observed at a macro level, 
although similar arguments can be applied at various levels 
of observation. 

Computational modeling of macroscopically particulate 
and inherently discontinuous media may not be dealt with in 
an adequate manner by a homogenized continuous descrip- 
tion, and the discrete nature of discontinuities needs to 
be taken into account. Such analyses usually concern a 
system of multiple bodies or particles (rigid or deformable), 
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Figure 2. Typical discrete element applications - 3-D Hopper flow (a) and development of Silo wall pressures (b) during filling oe 
discharge (after Lazarević D and Dvornik J. Selective time steps in predictor corrector methods applied to discrete dynamic models o 
Granular materials, In ICADD-4, 4th Conference on Analysis of Discontinuous Deformation, Bićanić N (ed.). University of Glasgow, 
2001), milling simulation (c) (after Cleary, 2002), 3-D masonary arches (d) (after Lemos JV. Assessment of the ultimate load of masonry 
arch using discrete elements. In Comp Meth in Struct Masonry 3, Middleton J and Pande G (eds). Books and Journals International, 
1995). A color version of this image is available at http://www.mrw interscience. wiley.com/ecm 


which are potentially coming into contact as the solution 
progresses and the bodies to be considered can in gen- 
eral be of arbitrary shapes (e.g. granular materials). The 
solution evolves in time and is typically treated as a 
dynamic problem, which may or may not have a steady state 
solution. For discontinuous media, for example, jointed 
rocks or masonry, discontinuities are either preexisting (e.g. 
joints, bedding planes, interfaces, planes of weakness, con- 
struction joints) or emerging, in particular, in the case of 
cohesive frictional materials, where the growth and coa- 
lescence of micro-cracks eventually appear in a form of a 


macro-crack. Many structures, structural systems, or struc- 
tural components comprise discrete discontinuities, which 
need to be taken into account, where discontinuities may 
be heterogenous or highly regular or structured, An obvi- 
ous example of structured discontinua is the brick masonry, 
or jointed rock structures in which the displacement dis- 
continuities commonly occur at block interfaces, without 
necessarily rendering structnres unsafe. Other structures 
exhibiting macro discontinuities (stone masonry, cracked 
structures, dilatation, or expansion joints) fall into a similar 
category. 
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This chapter will treat several important aspects of vari- 
ous discrete element methods such as: 


Body geometry characterization and contact detection 
Imposition of contact constraints and boundary 
conditions 

e Definition and description of deformability 

e Fracturing and fragmentation, transition from continua 
to discontinua 

e Time stepping solution schemes 


It will be shown that there are many similarities between 
the apparently different methods, and in the following, the 
most commonly encountered discrete element methods will 
be considered in the way they encompass the above aspects. 
Associated discontinuous modeling frameworks will also be 
discussed. 


2 BASIC DISCRETE ELEMENT 
FRAMEWORK AND 
REGULARIZATION OF NONSMOOTH 
CONTACT CONDITIONS 


In broader terms, the discrete element methods deal with 
either rigid or deformable particles. If the particles are 
rigid and are of a simple shape, an event-by-event simula- 
tion strategy can be applied. In such cases, collision times 
for particles can be calculated exactly and the momentum 
exchange methodologies are used to determine postcolli- 
sion velocities, as the contact time is considered to be 
infinitely short. Any energy loss during contact is accounted 
for via the restitution coefficients or friction and the sim- 
ulation deals with the nonsmooth step changes and rever- 
sals in velocities. Such methodologies have been used for 
molecular dynamics simulations with very large number of 
particles, and a range of contact detection and visualiza- 
tion techniques have been developed. However, although 
the event-driven algorithms work well for loose (gas-like) 


assemblies of particles, for dense configurations these lead 
to an effective solution locking, that is, critically slow 
simulations, a phenomenon referred to as an inelastic col- 
lapse (McNamara and Young, 1994). 

Collision of deformable bodies implies that the contact 
time is not infinitely short and that contact forces vary 
for the duration of the contact. Any simulation strategy 
therefore calls for some form of time stepping scheme and 
some way of regularizing the nonsmooth nature of the 
nonpenetration and friction condition. 

Constraints of nonpenetrability during the contact 
between the two bodies — termed here as a contactor 
Q, and a target body Q,—implies that no material point 
belonging to the contactor body should cross the boundary 
of the target body, that is, the gap between them must be 
nonnegative. 

Only a compressive (here assumed positive) interaction 
force F, is possible, that is, no attraction force between 
the two bodies exists and this interaction force vanishes 
for nonactive contact g > 0. These strict conditions of a 
unilateral contact can be mathematically described by the 
so-called Signorini condition. The above infinitely steep 
(i.e. ‘nonsmooth’) graph (Figure 3) can be regularized by 
assuming that the interaction force F, is a function of the 
gap violation, which can be physically interpreted through 
elastic properties of an assumed contact layer. The infinitely 
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Figure 3. Nonsmooth treatment of normal contact. 
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Figure 4. Regularized treatment of normal contact, with a linear and nonlinear penalty term. - 
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Figure 5. Nonsmooth and regularized treatment of frictional contact. 


steep graph is thereby replaced by the penalty formulation, 
with a linear or nonlinear penalty coefficient (Figure 4). 
Nonsmooth relations also exist if the interaction law con- 
siders a tangential friction force F, related and opposed to 
the sliding velocity $. For the Coulomb friction law, there 
is a threshold tangential force proportional to the normal 
interaction force F, = p F,, before any sliding can occur, 
corresponding again to an infinitely steep graph (Figure 5). 
Usual regularization in the case of small tangential displace- 
ments (proportional to sliding velocity) is to formulate the 
friction law such that the friction force is proportional to 
the relative tangential displacement. 

As a result of the above regularizations, the behavior 
of the collection of bodies is now governed by a set of 


Figure 6. (a) Discrete element bodies (particles) in contact, giving rise to axial and tangential contact forces. Force magnitudes related 
to the relative normal and tangential velocity and to relative normal and tangential velocity at the contact point. (b) Arbitrary particle 
shapes as assemblies of clustered particles of simple shapes. (c) Contact of clustered particles, including liquid bridge forces to simulate 
wet particles (Groger T, Tuzun U and Heyes D. Shearing of wet particle systems discrete element simulations. In Zst International PFC 
Symposium, Itasca, 2002). A color version of this image is available at http://www.mrw.interscience.wiley.com/ecm 


differential equations, which can be solved by some time 
stepping technique, where very small time steps are used in 
order to ensure accuracy in enforcing very stiff interaction 
conditions, 

The initial formulation of the discrete element method, 
originally called distinct element method or DEM (Cundall, 
1971), was based precisely on such regularization concepts 
and on the assumption of rigid elements and deformable 
contacts. Later extensions to include local deformation 
have permitted more rigorous treatment of both the contact 
conditions and energy preservation requirements. Over a 
period of time, a number of more sophisticated models 
for both the solid material as well as contacts have been 
formulated within the discrete element context. 
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The overall algorithmic framework for the DEM (with 
regularized contact constraints) is conceptually straightfor- 
ward and it has remained more or less the same since 
the method was first introduced. Viscous contact forces 
(proportional to the relative velocities in the normal and 
tangential direction) are also included, adding even more 
to the regularized nature of the contact conditions. More 
complex interaction force fields can also be considered 
(Figure 6). The method basically considers each body in 
turn and at any given time determines all forces (external or 
contact) acting at it. Any out-of-balance force (or moment) 


induces an acceleration (translational or rotational), which 
then determines the movement of that body during the next 
time step. 

The simplest computational sequence for the DEM 
(most often formulated in the ‘leap-frog’ format, see 
Table 1) typically proceeds by solving the equations of 
motion of a given discrete element using an explicit 
time marching scheme, while updating contact force his- 
tories as a consequence of contacts between different dis- 
crete elements and/or resulting from contacts with model 
boundaries. 


Table 1. Computational sequence for discrete element code (based on Cundall PA and Strack ODL. A discrete numerical 
model for granular assemblies. Geotechnique 1979, 29:47-65). 
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3 CHARACTERIZATION OF 
INTERACTING BODIES AND 
CONTACT DETECTION 


Computational time step in the realization of the discrete 
element method requires a detection of bodies in contact 
and the evaluation of the contact forces (both magnitude 
and direction) emanating from the contact. If the interacting 
bodies are of very simple geometry (e.g. circular (2-D) 
or spherical (3-D)), these issues are straightforward, as 
the algorithmic check for a possible overlap is simple 
and the definition of the contact plane is unambiguous. 
Bodies of more complex shapes can also be conveniently 
approximated by forming convenient clusters (Figure 6b) 
of rigidly connected circular or spherical particles, while 
the contact detection and resolution remain the same as 
for single particles. However, very often the interacting 
bodies of arbitrary geometry need to be considered, and 
the algorithmic complexity of the contact detection and the 
associated definition of the contact plane between the two 
bodies increase significantly. 

An efficient contact detection algorithm and a rational 
contact model to evaluate contact forces, for a large number 
of bodies where the relative position and shape of these 
arbitrary bodies may be continuously changing, is needed 
irrespective of the level of complexity adopted for the 
material description or discretization. 

Within the concept of the discrete element formulation, 
each element is considered as a separate, distinct body, 


Cell location for each body 
i= (int) [(%m/A) +1], j= Gnt) [0m /K) + 1) 
Xm Ym coordinates body centroid 
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which may or may not be in contact with various neigh- 
boring elements; hence, the main computational effort in 
such formulation is spent on the contact detection, that is, 
algorithms to establish which other bodies are in contact 
with the currently inspected body. The efficiency of these 
algorithms is crucial, as the conceptually simple procedure 
to test the possibility of contact of an element with all other 
elements at every time step becomes highly uneconomical 
once the number of elements becomes large. Contact search 
algorithms are typically based on so-called body based 
search or a space based search. in the former, only the 
space in the vicinity of the specified discrete element is 
searched (and the search repeated only after a number of 
time steps), whereas the latter implies a subdivision of 


“the total searching space into a number of overlapping 


windows. 

Contact detection problem between bodies of arbitrary 
geometries can be formally stated as finding a contact or 
overlap of a given contactor body with a number of bodies 
from a target set of N bodies in R” space (Fig. 7). As a con- 
sequence of a desire to deal with arbitrary geometric shapes, 
most algorithms typically employ a two-phase strategy. Ini- 
tially, all bodies may be approximated by simpler geometric 
representations, which encircle the actual body (bounding 
volume, bounding box or bounding sphere), and the list of 
possible contact pairs is established via an efficient global 
neighbor or region search algorithm. 

This is then followed by a detailed local contact resolu- 
tion phase, where the potential contact pairs are examined 
by considering the actual body geometries. This phase is 


I 


= oe mine 


Figure 7. Basis of the hashing or binning algorithm for simple particle shapes and clustered particles. A color version of this image 


is available at http://www.mrw.interscience. wiley.com/ecm 
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strongly linked with the manner the geometry of actual 
bodies is characterized. 


3.1 Global neighbor or region search 


A typical example of the initial region search phase is the 
so-called boxing algorithm, (Taylor and Preece, 1989). A 
complete computational domain is subdivided into regular 
intervals into cells, and a list of bodies overlapping a given 
cell (e.g. i, j in 2-D) is established via a contact detection of 
rectangular regions. Once this list is complete, the contact 
resolution phase for a given body comprises a detailed 
check against contact with all bodies listed to share the 
same cell and the check is usually extended to a list of 
bodies corresponding to a neighboring layer of cells (i.e. 
8 cells in 2-D, 26 cells in 3-D). 

A proper balance between the cell size and the maximum 
size of a body is clearly of the essence here. If the cell size 
is large compared to the body size, the initial search is fast, 
but many bodies may be listed as potential contact pairs and 
the contact resolution phase is likely to be extensive. On 
the other hand, if the cell size is small, the initial search 
is computationally demanding, which results in very few 
potential contact pairs and consequently, a less demanding 
contact resolution phase. A balance is reached with cell 
sizes that are approximately of the size of the largest body 
in the system. 

Different options exist to formulate an efficient searching 
algorithm and the concepts are again frequently borrowed 
from related fields, typically comprising compact and effi- 
cient data representation techniques to describe the geomet- 
ric position of the discrete element ~ for example, nodes, 
sides, or faces. The decomposition of the computational 
space and the efficiency of various cell data representa- 
tion for a large number of contactor objects (binary tree, 
quad tree, direct evidence, combination of direct evidence, 
rooted trees, alternating data trees; Figure 8) are usually 
adopted (Taylor and Preece, 1989; Bonet and Peraire, 1991, 
Munjiza and Andrews, 1998; Petrinić, 1996; Williams and 
O’Connor, 1999; Perkins and Williams, 2001; Feng and 
Owen, 2002). Algorithmic issues and details of the asso- 
ciated data structures are quite involved (Williams and 
O'Connor, 1999), and the efficiency of many contact detec- 
tion algorithms depend in a nonlinear fashion on the total 
number of bodies. Contact detection algorithms of linear 
complexity (i.e. linear dependence on the number of bodies) 
are desirable and indeed essential for simulations involving 
a very large number of bodies. A particularly detailed expla- 
nation, as well as the pseudocode for the so-called NBS (no 
binary search) algorithm for bodies of similar sizes (total 
contact detection time proportional to the total number of 


bodies, irrespective of the particle packing density) can be 
found in (Munjiza and Andrews, 1998). 

Very efficient data structures and representations can be 
used when simple geometries are considered, for exam- 
ple, when searching for a possible overlap of rectangles 
in 2-D or bounding boxes in 3-D. Typically, a given rect- 
angular domain in R” is characterized by a minimum set 
of parameters and then mapped into a representative point 
in an associated R?” space (Figure 9). For example, a 1-D 
segment (a — b) may be mapped into a representative point 
in 2-D space, with coordinates (a, b), or a 2-D rectangle of 
a size (Xmin — Xmax) and (Ymin — Ymax) May be mapped to a 
representative point in a 4-D space (Xmin? Ymin? Xmax’ Ymax- 
Alternative equivalent representation schemes are some- 
times preferred, for example, characterizing a rectangular 
domain in R? by the starting point coordinates (X nins Yin) 


and the two rectangle sizes (h,, hy) followed by a map-- 


ping into a different R4 space (Xmin’ Ymin Mx» Ap). As the 
representation of the physical domain is reduced to a point, 
region search algorithms can be more efficient in the rep- 
resentative R?” spaces than in the physical R” space. The 
easiest interpretation of the associated region search can be 
given for 1-D segments (Figure 9), but the concept can be 
generalized for 2-D and 3-D settings. 


3.2 Contact resolution phase 


Once the list of potential contact pairs is established, many 
different algorithms for the subsequent detailed contact res- 
olution are possible. These algorithms depend greatly on the 
manner in which the bodies are characterized or described. 
The contact resolution is not only needed to confirm (or 
otherwise) whether the potential contact pair is indeed in 
contact — the contact resolution phase also establishes the 
orientation of the contact plane, so that a local (1, t, s) 
coordinate system can be determined and the conditions 
for impenetrability or sliding can be properly applied. 

Typical geometry descriptors can be categorized 
into three main groups (Hogue, 1998) -— (a) polygon or 
polyhedron representation, (b) implicit continuous function 
representation (elliptical or general superquadrics) and 
(c) discrete function representation (DFR). 

If the geometry of polygonal domains in 2-D is defined 
in terms of corners and edges, a whole series of algorithms 
exist to determine an intersection of two coplanar polygons. 
Clearly, any restriction to convex polygons simplifies the 
algorithm considerably, as concave corners introduce addi- 
tional complexities, with multiple contact points possible. 
However, in terms of defining the orientation of the contact 
plane there are no ambiguities in considering a comtier-to- 
edge or an edge-to-edge contact, as the contact plane normal 
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Figure 8. An example of the altemating data tree concept for storing object data (based on Petrinić N. Aspects of Discrete Element 
Modelling Involving Facet-to-Facet Contact Detection and Interaction, PhD thesis, University of Wales, Swansea, 1996). 
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Figure 9. Mapping of a segment from 1-D space to a point in 
an associated 2-D space and mapping of a box in 2-D into 4-D 
space. 


is obviously defined by the edge normal. Difficulties arise 
when the corner-to-comer contact needs to be resolved 
(Figure 10), as the orientation of the contact plane (and 
hence its normal) cannot be uniquely defined. This ambi- 
guity can be avoided by the rounding of corners (Cundall, 
1988) to ensure continuous changes of the contact outer nor- 
mals, which was later enhanced through the introduction of 
a common plane (Cundall, 1988), ‘hovering’ between the 
two bodies that are coming into a corner-to-comer contact, 
whereby the actual orientation of this common plane is 


{a) (b) to) 


Figure 10. Definition of the contact plane — a unique definition 
for the comer-to-edge (a), edge-to-edge (b) case and an ambigu- 
ous situation for the corner-to-comer (c) contact problem (based 
on Hogue C. Shape representation and contact detection for dis- 
crete element simulations of arbitrary geometries. Eng. Comput. 
1998; 3:374-390, copyright notice of Springer-Verlag). 


found by maximizing the gap between the plane and a set of 
closest corners, Following that, a contact between a corner 
node and a plane is all that is needed for a robust contact 
resolution, with a vital benefit that the contact normal can 
always be determined. 

Another possible procedure (restricted to 2-D situation) 
utilizes an optimum triangularization of the space between 
the polygons (Miiller, 1996), whereby a collapse of a 
triangle indicates an occurrence of contact. 

On the other hand, the continuous implicit function rep- 
resentations of bodies, for example, elliptical particles 
in 2-D (Ting, 1992; Vu-Quoc, Zhang and Walton, 2000), 
ellipsoids in 3-D (Lin and Ng, 1995), or superquadrics 
(Figure 11) in 2-D and 3-D (Pentland and Williams, 1989; 
Williams and Pentland, 1992; Wait, 2001) 


ba, y) = E + H” -1 aS 


provide an opportnnity to employ a simple analytical check 
(inside-outside) to identify whether a given point lies inside, 
or on the boundary (x, y) < 0, or outside (x, y) > 0 of 
the body. However, it is significantly more difficult to solve 
a complete intersection of contacting superquadrics, and 
the solution is normally found by discretizing one of the 
surfaces into facets and nodes, so the contact for a specific 
node can be verified through the inside-outside analytical 
check with respect to the functional representation of the 
other body. 

A DER utilizes the description of a body boundary 
via a parametric function of one parameter at distinct 
intervals. The concept of the DFR (O’Connor, Gill and 
Williams, 1993) arose essentially from the actual DEM 
implementation of the implicit function representation of 
bodies, where the calculation of the function values for the 
inside-outside check can be accelerated by preevaluating the 
function on a background grid, with scalar values assigned 
to each background grid node, which can then act as a fast 
algorithmic look-up table. As the discrete function values at 
the grid nodes need not necessarily stem from some implicit 
function, a grid (or cage) of cells can also be used to model 


"Figure 11. Superquadrics in 3-D (reproduced from Hogue C. 


Shape representation and contact detection for discrete ele- 
ment simulations of arbitrary geometries. Eng. Comput. 1998; 
3:374-—390, copyright notice of Springer-Verlag). 


an arbitrarily shaped body — including bodies with holes 
(Figure 12). 

Simplicity of the DFR concept is illustrated here 
(Figure 13) through the polar DFR descriptor in 2-D 
(Hogue and Newland, 1994), where, following the global 


Figure 12, DFR representation of a 3-D object with holes (cour- 
tesy of Williams, MIT, JESL, and O’Connor (1999)). A color 
version of this image is available at http:/Awww.mrw.interscience 
-Wiley.com/ecm 
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Figure 13. Contact detection in the polar discrete functional 
representation of bodies’ geometry (after Hogue and Newland, 
1994). 


region search for possible neighbors, the local contact is 
established by transforming the local coordinates of the 
approaching comer P; of a body i into the polar coordinates 
of the other body P? and checking if no intersection 


between the segments (0,P;} ) and (M;N;) can be found. 


4 IMPOSITION OF CONTACT 
CONSTRAINTS AND BOUNDARY 
CONDITIONS 


4.1 Contact constraints between bodies 


Once the contact between discrete elements is detected, 
the actual contact forces have to be evaluated, which 
in turn influence the subsequent motion of the discrete 
elements controlled by the dynamic equilibrium equa- 
tions. Contact forces come about as a result of an impo- 
sition of contact constraints between the solution vari- 
ables at contacting points. In variational formulations, 
constraint functional n, can therefore be added to the 
functional of the unconstrained system in a variety of 
ways (see Chapter 6, Volume 2). The most frequently 
used penalty format includes a constraint functional n g= 
Jag $C pC) dQ, where C(u) = Q is the constraint 
equation and p is the penalty term. No additional solution 
variables are required but the constraint equation is satis- 
fied in an approximate sense, depending on the value of 
the penalty term. The use of large penalty terms clearly 
corresponds to a better imposition of the contact con- 
straint, but such a choice corresponds to poorer condition- 
ing of equations and has implications for the numerical 
integration schemes in terms of their stability and accu- 
racy. On the other hand, the use of low penalty terms 
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Figure 14. Determination of the contact surface and its local n — t coordinate system. 


leads to poorly satisfied contact constraints. Other forms 
of constraint functionals are possible, for example, the 
Lagrangian multiplier method n, = fo 7 C(u) dQ, where 
the constraints are satisfied exactly, but an additional vari- 
able à; is introduced at every contact location. Modifica- 
tions of the Lagrangian multiplier method come in a form of 
the Perturbed Lagrangian method x, = fo N C(u) dQ — 
fo ATANAdQ or most notably the Augmented Lagrangian 
Method 1, = fo PCU) dQ — fo C7 (u)pC(u) dQ, which 
combines the advantages of both the penalty and Lagrangian 
multiplier method, through an iterative update of Lagrange 
multipliers 4,, without new variables introduced into the 
solution. The constrained Lagrangian approach is often 
adopted in quasi-static situations, but this appears to be far 
too expensive in the transient dynamic setting, as it requires 
an iterative sequence within every time step at every contact 
location. 

Most discrete element formulation utilize the penalty 
function concept, which ultimately requires the information 
about the orientation of the contact surface (Figure 14) and 
its normal n as well as a geometric overlap or penetration 
of contactor objects to establish the orientation and the 
intensity of the contact forces between contactor objects at 
any given time, which in turn define the subsequent motion 
of the discrete elements. The nonpenetration condition is 
formulated through the gap function g = [x — x/]-n <0, 
which leads to the relative displacement in the normal and 
tangential direction u, = (u; —uj)-n, u, = (u; —U,)- t 
and the resolution of the total contact traction into t, = 
t-n +t, -t= +t, which is then integrated over the 
contact surface to obtain the normal F, and tangential 
component F, of the contact force. 

Whichever method is adopted, the imposition of the con- 
tact constraint is related to the normal and tangential direc- 
tions associated with the orientation of the contact plane, 


which is normally well defined, but clearly ambiguous in 
the case of the corner-to-corer contact. As no rigorous 
analytical solution exists, rounding of corners for arbitrary 
shaped bodies leads to an approximate Hertzian solution. In 
the case of a nonfrictional contact (i.e. normal contact force 
only), a robust resolution to the corner-to-corner problem 
in 2-D comes from the energy-based algorithm (Feng and 
Owen, 2002b), in which the concept of the contact energy 
potential is introduced. Contact energy W is assumed to 
be a function of the overlap area between the two bodies 
W(A) and the contact force is oriented in the direction that 
corresponds to the highest rate of reduction of the overlap 
area A. As the overlap area is relative to bodies Q; and 
Q;, it can be expressed as a function of position of the cor- 
ner point x, and the rotational angle @ with respect to the 
starting reference frame. 

Such an analytical process (Figure 15) leads to an 
unambiguous orientation of the contact plane in the 2-D 
corner-to-corner contact case, running through the intersec- 
tion points g and h, and the contact force over the contact 
surface b,, needs to be applied through the reference con- 
tact point shifted by a distance d = (M,/\|F,||) from the 
corner, where F, and Mọ are defined through the con- 
tact energy potential as F, = [3 W (A)/ dAI[GAG,, 6)/ əx] 
and M, = [3 W (A)/3AJ[3 A(x , 0)/30]. Different choices 
for the potential function are capable of reproducing various 
traditional models for contact forces. 
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Figure 15. Comer-to-corner contact, based on energy potential (after Feng and Owen, 2002b). 


The implementation of the nonlinear penalty term based 
on Hertz’s analytical solution for circular deformable bodies 
has proved to be far superior (Thornton, 1992) to the 
adoption of a constant penalty term, especially in terms of 
avoiding spurious and artificial generation of energy during 
collisions, where the contact time within an increment is 
less than the computational time increment (see Chapter 6, 
Volume 2). 


4.2 Contact constraints on model boundaries 


An important aspect in the discrete element modeling is 
related to the representation and the treatment of model 
boundaries, for example, rigid or deformable, restrained 
or movable. Boundaries can be formulated either as real 
physical boundaries, or through virtual constraints. In the 
so-called periodic boundary, often adopted in the DEM 


i% 


| Periodic 


analysis of granular media, a virtual constraint implies that 
the particles (or bodies) ‘exiting’ on one side of the compu- 
tational domain with a certain velocity are reintroduced on 
the other side, with the same, now ‘incoming,’ velocity. In 
cases of particle assemblies, the flexible (Kuhn, 1993) and 
hydrostatic boundaries (Ng, 2002) have been employed to 
improve the realism of a simulation as compared to the 
periodic boundary concept. 

The flexible boundary framework (Figure 16) can be 
seen as a physical process of ‘stringing’ together parti- 
cles on the perimeter of the particle assembly, forming 
a boundary network. Additional algorithmic issues arise 
related to an automatic identification of these perimeter 
particles and with updates of the boundary network as parti- 
cles move. Flexible boundaries are mostly used to simulate 
the controlled stress boundary condition o,; = of, where 


the traction vector 4" = A,ogn? is distributed over the 
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Figure 16. Flexible boundaries (reproduced from Kuhn M. A flexible boundary for three dimensional DEM particle assemblies, In 2nd 
International Conference on Discrete Element Methods, Williams J and Mustoe GGW (eds). MIT IESL Publication, 1993). 
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centroids of three particles connected to form a triangular 
facet A,,. 

The concept of the hydrostatic boundary is very simple 
(Figures 17 and 18), comprising a virtual wall of pressur- 
ized fluid imagined to surround granular material particles. 
If the particle shapes are characterized by some analytical 
expression (e.g. ellipsoid), intersection area with the vir- 
tual wall can easily be determined and the contact force is 
determined as F, = A, p(h,), where p(h,) is the prescribed 


Figure 17. Examples of granular assemblies, comprising ellip- 
soidal particles, compacted under two idealized boundary con- 
ditions (a) hydrostatic and (b) periodic boundaries (reproduced 
from Ng TT. Hydrostatic boundaries in discrete element methods. 
In Discrete Element Methods: Numerical Modeling of Discon- 
tinua, 3rd International Conference on Discrete Element Methods, 
Cook BK and Jensen PJ (eds). ASCE Geotechnical Special Pub- 
lication No. 117, 2002.) 


Wail plane 


Figure 18. Concept of the hydrostatic boundary (after Ng TT. 
Hydrostatic boundaries in discrete element methods. In Discrete 
Element Methods: Numerical Modeling of Discontinua, 3rd Inter- 
national Conference on Discrete Element Methods, Cook BK and 
Jensen PJ (eds), ASCE Geotechnical Special Publication No. 117, 
2002.) 


hydrostatic pressure at the centroid of the intersection area 
A,- Hydrostatic boundaries have also been used in combi- 
nation with periodic boundaries. 

It should be noted that the use of periodic bound- 
aries excludes capturing of the localization phenomena and 
the introduction of real physical boundaries is required 
to account for these effects. Physical boundarics are also 
needed in problems in which boundary effects are impor- 
tant. Simplest form of boundary representation in 2-D is 
associated with a geometric definition of line segments 
(often referred to as wails in the DEM context), whereas 
the kinematics of the contact between the particle and the 
wall is again resolved in the penalty format. Frequently, 
individual particles are declared as nonmovable, thereby 
creating an efficient way of representing and characterizing 
a rigid boundary, without any changes in the contact detec- 
tion algorithm. Recent successful ideas include the so-called 
finite wall (FW) method (Kremmer and Favier, 2001b) in 
which the boundary surface is triangulated into a number of 
rigid planar elements via a number of descriptor parameters, 
which are then in turn used to define an inscribed circle, as 
an efficient geometric object used in the contact detection 
analysis between the particles and the boundary. 

Moreover, as the DEM analysis is usually set in a 
transient dynamics setting, problems with the treatment of 
artificial boundaries extending into infinite domains remain 
similar to the ones associated with the transient dynamic 
finite element analysis, that is, the issue of nonreflecting or 
transmitting boundaries needs to be taken into account (see 
Chapter 12, Volume 2). 


5 MODELING OF BLOCK 
DEFORMABILITY 


Consideration of increased complexities in the geometric 
characterization of rigid discrete element particles or bodies 


(circles —> clusters of circles — ellipses — superquadrics 
— general polygonal in 2-D; spheres — clusters of spheres 
— ellipsoids > general superquadrics + general polyhe- 
dra in 3-D) has made the method popular in many engineer- 
ing applications. Subsequent DEM developments gradually 
introduced further complexities; in particular, the descrip- 
tion of particle’s deformability deserves attention. 

Early attempts centered around superimposing a descrip- 
tion of particle deformability on top of the rigid body 
movements, so that a displacement at any point within 
a simply deformable element can be expressed by u; = 
u? + œx + ext, where u? is the displacement of ele- 
ment centroid, jj, 8 jy are the rotation and strain tensor 
respectively, and x; represent local coordinates of the point, 
relative to element centroid. Displacements of particle cen- 
troids emanate from standard equations for the translation 
of the center of mass in ith direction >, F; = mii;, and the 
rotation about the centre of mass $; Mf = 16,. The simply 
deformable discrete elements (Cundall, 1988) introduced 
equations. for generalized strain modes ¢,, independent of 
the rigid body modes m*é* = of — ok, where m* is the 
generalized mass, o% is generalized applied stresses, and 
of is generalized internal stresses, corresponding to strain 
modes. Different choices for the generalized strain field lead 
to different system matrices. 

An alternative was suggested (Williams, Hocking and 
Mustoe, 1985) by introducing several body deformation 
mode shapes superimposed on top of the discrete body 
centroid motion. In that context, discrete element defor- 
mation field (displacement relative to the centroid) can 
also be expanded in terms of the eigenmodes of the 
generalized eigenvalue problem, associated with the dis- 
crete element stiffness and mass matrix, giving rise to the 
Modal Expansion Discrete Element Method (Williams and 
Mustoe, 1987). Here, the corresponding additional set of 
‘deformability’ equations becomes decoupled (because of 
the orthogonality of eigenvectors) and modal amplitudes 
are solved for from simple scalar equations. The equations 
of motion are written with respect the noninertial frame of 
reference, in order to ensure full decoupling. 

Eventually, two predominant realizations of modeling 
block deformability appeared — the deformability of a dis- 
crete element of an arbitrary shape is either described 
by an internal division into finite elements (discrete finite 
elements or/and combined finite/discrete elements) or by 
a polynomial expansion of a given order (discontinuous 
deformation analysis). 


5.1 Combined finite/discrete element method 


An early combination of both discrete and finite 
elements was first successfully employed in the discrete 
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finite element approach (Ghaboussi, 1988). The combined 
finite/discrete element approach (Munjiza, Owen and 
Bićanić 1995) represents an extension of the rigid discrete 
element formulation, where the deformability is included 
via finite element shape-functions, that is, the problem is 
analyzed by the combination of the two methods. Such 
a combined method is particularly suited to problems 
in which progressive fracturing and fragmentation take 
place. In practice, the overall algorithmic framework for 
a combined finite element/discrete element framework 
(Table 2), remains largely realized in an explicit transient 
dynamic setting, that is, the scheme proceeds by solving 
the equations of motion using an explicit time marching 
scheme, while updating force histories as a consequence 
of contacts between different discrete regions, which are in 
turn subdivided into finite elements. 

The deformability of individual discrete elements was 
initially dealt with by subdividing the body into triangular 
constant strain elements (Goodman, Taylor and Brekke, 


Table 2. Simplified pseudocode for the combined discrete/finite 
element method, small displacement analysis, including mate- 
rial nonlinearity (after Petrinić N. Aspects of Discrete Element 
Modelling Involving Facet-to-Facet Contact Detection and Inter- 
action, PhD thesis, University of Wales, Swansea, 1996). 


(1) Increment from the time station t = ty 
current displacement state un 
external load vector, contact forces F, Fo —> Fe 
internal force, e.g. Fi = fo B?o, dQ 


(2) Solve for the displacement increment from 
Mii, + Fi = frost 
tna, = MÊS — Fit att iyo 
Ung) = Un + UngiAt 
for an explicit time stepping scheme 
mili, + Fini = fou 
tae = L fos — F) At + Whip 
Mii 
What a ui, F WipAt 
(3) Compute the strain increment Atp: = f (Aun41) 


(4) Check the total stress predictor of, = 0, + DAg,41 
against a failure criterion, e.g. hardening plasticity 
Plony kK) = 0 


(5) Compute inelastic strain increment Acit*l,, e.g. 
associated plastic flow rule 


(6) Update stress state 0,41 = 0, + Ding (AEn — Acie) 


(7) Establish contact states between discrete element 
domains at f,,; and the associated contact forces Fê; 


(8) n+ n + 1, Go to step (1) 
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1968), which can be identified as an early precursor of a 
today’s combined finite/discrete element modeling. 

Large displacements and rotations of discrete domains, 
internally discretized by finite elements, have been rigor- 
ously considered (Barbosa and Ghaboussi, 1992) and typi- 
cally the generalized Updated Lagrangian (UL) method is 
adopted. The contact forces, derived through the penalty 
formulation (concentrated and distributed contacts) and 
governed by simple constitutive relationship, are trans- 
formed into the equivalent nodal forces of the finite element 
mesh. The equations of motion for each of the deformable 
discrete elements (assuming also a presence of the mass 
proportional damping (C = aM") are then expressed as 


M'U + aM'Ù =. Text + Somi PS, ‘Sone a "Fext 


+ ou ~ Df o BE EM (2) 
k k 


Evaluation of the internal force vector ‘T4’ fim = 
Ce Jiao, TA BE tA 0 dQ, at the new time station t + At 
recognizes the continuous changing of the configuration, as 
the Cauchy stress at t + At cannot be evaluated by simply 
adding a stress increment due to straining of the material 
to the Cauchy stress at t, and the effects of the rigid body 
rotation on the components of the Cauchy stress tensor need 
to be accounted for. 

In the computational realization, Barbosa and Ghaboussi 
(1992) used the central difference scheme for integrat- 
ing the incremental updated Lagrangian formulation, which 
neglects the nonlinear part of the stress strain relation- 
ship. Incremental Green strain As and the incremental 2nd 
Piola Kirchhoff stress AZ are calculated from incremen- 
tal displacements, which are then added to the 2nd Piola 
Kirchhoff stress, known from the old configuration, to be 
used subsequently to determine the internal force vector at 
the current configuration 


BEA = Df, "By Er dy (3) 
k k 


In updating of the reference configuration, and in order to 
proceed to the next increment, the new deformation gradient 
F is required and the update of the Cauchy stress follows 
from 


1 
ta = PEF 4) 


For inelastic analyses, due care needs be given to 
the objectivity of the adopted constitutive law. Advanced 
combined discrete/finite element frameworks (Owen et al., 


Table 3. Simplified pseudocode for the combined discrete/finite 
element method, large displacement analysis (adapted from 
Petrinić, 1996). 


(1) Increment from the time station £ = ty 
current displacement state up 


external load vector, contact forces Fy*, Fr > Bex 


internal force, e.g. Fi = fo B70, dQ 

(2) Solve for the displacement increment from 
Mil, + Fy = Fe 
ling. = MI (ÊS — FAL + ip 
Ung = Un + bn pi12At 

(3) Configuration update x4.) = Xn + Aun+i 

CETIS 

Oxn 


(4) Deformation Gradient F,41 = 


(5) Strain increment Agys1 = f (Fred = $F.) Fatt 

(6) Inelastic strain a ie Asie, from e.g, 
i -IY (Engi 
ee aa 

(7) Update 2nd Piola Kirchhoff stress state Z,4; = En 
+ D(Atn41 — Ash) 

(8) Rotate stress state to obtain Cauchy stress 0,4; = 
Rast x, H R? 

(9) Establish contact states between discrete element 
domains at t,,; and the associated contact forces Fp}; 


(10) n + n +1, Go to step (1) 


1999) included a rigorous treatment of changes in config- 
uration and evaluation of the deformation gradient and the 
objective stress measures; see Table 3. 


5.2 Discontinuous deformation analysis, DDA 


An altemative deformability representation is employed in 
the discontinuous deformation analysis (DDA), in which 
a general polynomial approximation of the strain field is 
superimposed to the centroid movement for each discrete 
body (Shi, 1988). The DDA analysis appeared initially as 
an efficient framework of modeling jointed deformable 
rock and its development followed the formulation of the 
Keyblock Theory (Shi and Goodman, 1981), which repre- 
sents a procedure to assess stability limit states of rigid 
jointed rock blocks in 3-D. Blocks of arbitrary shapes with 
convex or concave boundaries, including holes are consid- 
ered. The original framework under the name the ‘DDA 
method’ comprises a number of distinct features ~ (a) the 
assumption of the order of the displacement approximation 
field over the whole block domain, (b) the derivation of 
the incremental equilibrium equations on the basis of the 
minimization of potential energy, (c) the block interface 


constitutive law (Mohr—Coulomb) with tension cutoff, and 
(d) use of a special implicit time stepping algorithm. The 
original implementation has been since expanded and mod- 
ified by many other researchers, but the essential structure 
of the DDA framework has not substantially changed. The 
method is realized in an incremental form and it deals with 
the large displacements and deformations as an accumula- 
tion of small displacements and deformations. The issue of 
inaccuracies when large rotations are occurring has been 
recognized and several partial remedies have been pro- 
posed (McLaughlin and Sitar, 1996; Ke, 1996). 

Leaving aside specific algorithmic features that consti- 
tute the DDA methodology, the method can best be seen 
as an alternative way of introducing solid deformability 
into the discrete element framework, where block sliding 
and separation are considered along predetermined discon- 
tinuity planes. Early formulation was restricted to simply 
deformable blocks (constant strain state over the entire 
block of arbitrary shape in 2-D, Figure 19), where the first- 
order polynomial displacement field for the block [u viik, 
can be shown to be equivalent to the three displacement 
components of the block centroid, augmented by the dis- 
placement field, which has a distinct physical meaning, 
which corresponds to the three constant strain states. All 
six variables are denoted by the block deformation vector 
D ist 
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Improved model deformability is achieved by either 
increasing the number of block deformation variables 
(higher-order DDA, where higher-order strain fields are 
assumed for blocks of arbitrary shapes) or by the so-called 
subblock concept (Lin, 1995), in which a block is subdi- 
vided into a set of simply deformable subblocks. 

In that spirit, the second-order approximation for the 
block displacement field requires 12 deformation variables, 
which can also be given a recognizable physical meaning — 
the deformation parameters comprise the centroid displace- 
ments and rotation, strain tensor components at the centroid, 
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DI = [Us V, Q Ex, Ey Ey] 


ba) = T;D; 


Figure 19. Deformation variables for the first- and second-order 
polynomial approximation in discontinuous deformation analysis. 
A color version of this image is available at http://www.mrw. 
interscience,wiley.com/ecm 


as well as the spatial gradients of the strain tensor compo- 
nents, that is, 


[D "Wad 


= [ko vo Po e? e VE, bee Eye Yan Oxy Eyy Yaya 

(6) 

For the higher-order approximations (Ma, Zaman and 

Zhou, 1996) for the block displacement field, it is difficult 

to give a clear physical interpretation to the deformation 

variables, and the generalized deformation parameters are 
adopted 


u = d, + d3x + dsy + d}x? + doxy + dy’ + 
+d, x" + +dp 1" 

v = d, + dx + dey + dgx? + dioxY + dy” + 
+d,x" + +d,y" 
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Once the block displacement field is approximated with 
a finite number of generalized deformation variables, the 
associated block strain and block stress field can be 
expressed in a similar manner as in the context of finite 
elements as 


fe] = [BID] 
[o°] = [Eiet] = LE BLD] (8) 


For a system of N blocks, all block deformation vari- 
ables ( variables per block, depending on the order of the 
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approximation) are assembled into a set of system defor- 
mation variables (N * 2) and the simultaneous equilibrium 
equations are derived from the minimization of the total 
potential energy. The total potential energy m comprises 
contributions from the block strain energy 1,, energy from 
the external concentrated and distributed loads np, m,, 
interblock contact energy x., block initial stress energy 
Ty as well as the energy associated with an imposition 
of displacement boundary conditions 1, 


=N, + Tp tu, +N, +t, +h (9) 


Components of the stiffness matrix and the load vector 
are obtained by the usual process of the minimization of 
the potential energy 


[Ki] = Í (BY IE'NIB' Jag ao 


The global system stiffness matrix contains {n * n) sub- 
matrices K,; and K,, where the nonzero submatrices K,, are 
present only if and when the blocks i and j are in active 
contact (Figure 20) and D comprises deformation variables 
of all blocks considered in the system. 

The interblock contacts conditions of nonpenetration and 
Mohr—Coulomb friction can be interpreted as block dis- 
placement constraints, which is algorithmically reduced to 


Figure 20. Assembly process in DDA analysis. A color version 
of this image is available at http:/Awww.mrw.interscience,wiley. 
com/ecm 


an interaction problem between a vertex of one block and 
the edge of another block. If the deformation increments 
of the two blocks are denoted by D* and D/ respectively, 
the nonpenetration of the vertex in the direction normal 
to the block edge can be expressed as a function of these 
deformation increments by g 


d =C + [D] IR'] + [D] IG] = 0 an 


which represents the contact constraint for the block defor- 
mation variables D' and D/ and where C is a function of 
the location of the vertex and the two end point of the block 
edge at the beginning of an increment. Various algorithmic 
approaches can be identified depending on which format 
is used for an implicit imposition of the nonpenetration 
condition (Figure 21). 

If the penalty format is adopted, additional terms appear 
both in the global DDA stiffness matrix as well as on the 


Table 4. Additional terms in the stiffness matrix and the load 
vectors as a result of contact between bodies i and j. 


Additional DDA stiffness matrix terms and changes in the 
load vector 


Normal nonpenetration constraint 

Ki = K? 4 pH 
KY = plH'[G/]" 
K = p[G/yA'y 

Ki = KË + pies} 

Frictional constraint 

K" = K" + pl, LH IF 
KY = plH;, MG; 
K” = p[G} Hj) 

Kil = KÖ + iG; UG; 


Fi = Fi — pC[H'] 
Fi = Fi — pC{G?] 


Fi = Fl — pCy,[ Hi] 
Fi = Fi — pCp[G},] 


Figure 21. Nonpenetration and frictional contact constraint in DDA, point-to-edge contact. 


RHS load vector (Table 4), and the terms differ depending 
on the nature of the constraint (normal nonpenetration or 
frictional constraint). 

In both cases, the penalty formulation leads to a nonlinear 
iterative scheme that proceeds until the global equilibrium 
is satisfied (norm of the out-of-balance forces within some 
tolerance) while at the same time a near zero penetration 
condition is satisfied at all active contact positions. In the 
case of a normal nonpenetration condition, the convergence 
implies that the identified set of contacts does not change 
between iterations, whereas in the case of a frictional con- 
straint, it implies that the changes in the location of the 
projected contact point remain within a given tolerance. 
For complex block shapes, the convergence process may 
sometimes be very slow indeed, as both activation and deac- 
tivation of contacts during the iteration process are possible. 
The convergence of the solution algorithm depends highly 
on the choice of the penalty term, and the process may often 
lead to ill-conditioned matrices if a very large penalty term 
is employed to ensure that the penetrations remain close to 
zero. 

Alternatively, if the Lagrange multiplier method is used, 
the utilization of the constraint equation between blocks 
i and j and the solution requires the use of a special 
matrix pseudoinversion procedure. In the context of the 
Augmented Lagrange Multiplier Method, an iterative com- 
bination of a Lagrange multiplier and a contact penalty 
spring is utilized and the iteration proceeds until the pene- 
tration distance and the norm of the out-of-balance forces 
is not smaller than some specified norm. 


6 TRANSITION 
CONTINUUM/DISCONTINUUM, 
FRAGMENTATION IN DISCRETE 
ELEMENT METHODS 


Inclusion of fracturing and fragmentation to the discrete 
element method started in mid- and late-’80s (Mustoe, 
Williams and Hocking, 1987; Hocking, Mustoe and 
Williams, 1987, Williams, Mustoe and Hocking, 1987; 
Hocking 1989), including interelement and through-element 
brittle fracturing. Complexities associated with modeling 
of conditions for gradual fracturing (strain localization and 
strain softening prior to the eventual separation by cracking 
or shear slip) are the same for both the nonlinear finite 
element frameworks as well as for the combined finite 
element/discrete element simulations (De Borst, 2001) (see 
Chapter 10, Volume 2). The continuum-based material 
models for fracturing media are usually generalizations of 
elastoplasticity laws using different failure (or fracture) 
surface descriptions, and no discontinuities are admitted 
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in the displacement field as the geometry of a problem 
remains unchanged. Some models attempt to also simulate 
postfracturing behavior, again via continuum formulations. 
Models adopted for prefragmentation, that is at a continuum 
stage, are usually based on concepts of damage mechanics, 
strain softening plasticity formulations (often utilizing 
fracture mechanics concepts of energy release required 
to open a crack or induce a shear slip), or have been 
formulated using some higher-order continuum theory. 

Fracturing in DEM was typically confined to element 
interfaces, where models were either based on the 
fracture energy release rate concept or on the breakage 
of cohesive links between discrete elements. The 
combined finite/discrete element method (Munjiza, Owen 
and Bićanić, 1995) considers fracturing media, starting from 
a continuum representation by finite elements of the solid 
domain of interest, allowing for progressive fracturing to 
take place according to some fracturing criterion, thereby 
forming discontinuities and leading eventually to discrete 
elements that may be composed of several deformable 
finite elements. Subsequent motion of these elements and 
further fracturing of both the remaining continuum domain 
and previously created discrete elements are then modeled 
and monitored. In case of fracturing and fragmenting 
media, the main issues which require consideration are 
(a) finite element formulation capable of capturing the 
strain localization leading onto subsequent fracturing 
of the original continuum, (b) fracturing criteria and 
models, (c) remeshing algorithms for fully fractured zones, 
(d) contact detection procedures, and (e) representation of 
frictional contact conditions. 

Improved formulation for nonlinear physical models and 
associated computational issues for DEM are closely related 
to advances in continuum-based computational plasticity, 
usually adopted in the FEM context. On the algorith- 
mic front, improvements include more efficient contact 
detection and interaction algorithms, as well as the intro- 
duction of advanced computational concepts (parallel and 
distributed computing, object oriented programming, in 
core databases). Approaches for coupling discrete element 
methods with fluid fiow methods have also appeared and 
a number of combined finite/discrete element simulations 
of various engineering problems have been reported, where 
the ‘traditional’ nonlinear explicit transient dynamic finite 
element and combined finite/discrete element formulations 
differ mainly in the number of contacts considered, the 
automatic creation of new discrete elements and the detec- 
tion of new contacts. 

An additional algorithmic problem arises upon separa- 
tion, as the ‘book keeping’ of neighbors and updating of 
the discrete element list are needed whenever a new (par- 
tial or complete) failure occurs. In addition, there is also a 
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Remeshing with 
through-element fracture 


Remeshing with 
inter-element fracture 


3-D fracture plane 


Figure 22. Element- and node-based local remeshing algorithm in 2-D and 3-D context (after Munjiza A, Owen DRI and Bićanić N. 
A combined finite/discrete element method in transient dynamics of fracturing solids. Eng. Comput. 1995; 12:145-174; Owen et al., 


1999), based on a weighted local residual strength concept. A col 
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need to transfer state variables (plastic strains, equivalent 
plastic strain, dissipated energy, damage variables) from the 
original deformable discrete element to the newly created 
deformable discrete elements. 

Advances in continuum-based computational plasticity 
in terms of the consistent linearization required to secure 
a quadratic convergence of the nonlinear solution proce- 
dure have also influenced algorithms used in the DEM 
context. More accurate and robust stress return algorithms 
for softening plasticity models are employed, involving 
complex smooth surface descriptions, as well as surfaces 
with singular regions, where the computational features are 
frequently borrowed from various nonlinear optimization 
algorithms. 

After fragmentation, every time a partial fracture takes 
place, the discrete element changes its geometry, and a 
complete fracture leads to the creation of two or more 
discrete elements, there is a need for automatic remeshing 
of the newly obtained domains (Figure 22). An unstructured 
mesh regeneration technique is preferred in such cases, 
where the mesh orientation and mesh density can sometimes 
be decided upon on the basis of the distribution of the 
residual material strength within the damaged rock or some 
other state variable. 


or version of this image is available at http://www.mrw.interscience. 


Figure 23. Fracturing in simply deformable DDA (reproduced 
from Lin C. Extensions to the DDA for Jointed Rock Masses 
and other Blocky Systems. PhD thesis, University of Colorado, 
Boulder, 1995). ; 


Figure 24. Fracturing of notched concrete beam modeled by 
jointed particulate assembly, with normal contact bond of limited 
strength (Itasca PFC 2D). A color version of this image is 
available at http://www.mrw.interscience.wiley.com/ecm 


The DDA block fracturing algorithm through block cen- 
troid (Figure 23), in the context of rock fracture, comprises 
the Mohr—Coulomb fracturing criterion with a tension cut- 
off based on the stress state at the centroid, where the newly 
formed discontinuities are introduced and are further treated 
in the same way as the original discontinuity planes. 

Fragmentation frameworks are also used with DEM 
implementations that consider particles bonded into clusters 
(to form more complex particle shapes), which can also be 
bonded into particle assemblies to represent a solid material. 
The bond stiffness terms are predominantly derived on 
the basis of equivalent continuum strain energy (Morikawa 
and Sawamoto, 1993, Griffiths and Mustoe, 2001). In 
such lattice-like models of solids, fracturing (Figure 24) 
is realized through breakage of interparticle lattice bonds. 
Typically two types of interparticle bonds are considered — 
a simple normal bond (or truss) and a parallel bond (or 
beam), which can be seen as a microscopic representation 
of the Cosserat continuum (see Chapter 10, Volume 2). 
Bond failure is typically considered on the basis of lim- 
ited strength, but some softening lattice models for quasi- 
brittle material have also considered a gradual reduction of 
strength, that is softening, before a complete breakage of 
the bond takes place. Despite often very simple bond fail- 
ure rules, bonded particulate assemblies have been shown 
to reproduce macroscopic manifestations of softening, dila- 
tion, and progressive fracturing. 

The solution algorithm for such bonded particle assem- 
blies remains usually an explicit time stepping scheme, 
that is, the overall stiffness matrix is never assembled. 
Steady state solutions are obtained through the dynamic 
relaxation (local or global damping, viscous or kinetic). A 
jointed particulate medium is very similar to the more recent 
developments in lattice models (Schlangen and van Mier, 
1992; Jirasek and Bažant, 1995; D’ Adetta, Ramm and Kun, 
2001) for heterogeneous fracturing media. Adaptive contin- 
uun/discontinuum strategies are also envisaged, where the 
discrete elements are adaptively introduced into a contin- 
uum model, if and when conditions for fracturing are met 
(see Chapter 10, Volume 2). 


Discrete Element Methods 331 


7 TIME INTEGRATION — TEMPORAL 
DISCRETIZATION, ENERGY 
BALANCE, AND DISCRETE ELEMENT 
IMPLEMENTATION 


Problems with spurious energy generation arise when 
applying the traditional explicit time stepping scheme to 
dynamic contact problems using the penalty approach. 
Every time the material point penetrates the boundary, it 
may happen that for some time that is shorter than the com- 
putational time increment, the scheme considers no contact 
force to resist penetration, although contact may exist for 
a part of the time increment. On the other hand, when the 
contact is released, the scheme may consider the contact 
force to be pushing the material point from the boundary, 
although there may be no penetration for certain part of the 
computational time increment. Consequently, some artifi- 
cial spurious energy is created at every contact location 
and hence some form of controlling the energy balance is 
needed. 

Local contact energy dissipation is sometimes introduced 
to avoid any artificial increase of energy (Munjiza, Owen 
and Crook, 1998). Such modified temporal operators do 
not affect the critical time step, and the choice of the 
actual computational time step can be related to a degree 
of numerical energy dissipation added to the system in a 
controlled way. For the purpose of monitoring the possible 
creation of spurious energy, as well as monitoring energy 
dissipation during fracturing, a continuous energy balance 
check is desired. The incorporation of softening imposes 
severe limits on the admissible time step. 

The traditional DEM framework implies a condition- 
ally stable explicit time stepping scheme. In the DDA 
context, for static problems, the resulting system of equa- 
tions can be solved by any equation solver, which may be 
singular when blocks are separated. In order to ensure sys- 
tem stiffness matrix regularity in such situations, very soft 
spring stiffness terms are added to the block centroid defor- 
mation variables. Early realizations of the DDA framework 
utilized a particular type of the generalized collocation time 
integration scheme (Hughes, 1983) (see Chapter 5, Vol- 
ume 2) 
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In an incremental form, the recursive algorithm leads to 
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In the DDA, a specific choice of the time integration 
parameters 9 = 1, B = 0.5, è = 1 is usually adopted, which 
represents an implicit, unconditionally stable scheme. For 
this choice, the effective matrix next to the acceleration 
vector X,, vanishes, hence 
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leading to the ‘effective stiffness matrix’ K and the ‘effec- 
tive load vector’ f, which now include the inertia terms 
and the velocity at the start of the increment 
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The solution for the next incremental deformation vector 
AD,,,; is obtained from 


AD, 4) = KTA fa (16) 
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The next time rate of deformation vector (velocity) then 
equals 
2 
Vio = Ay AP nat] =V (17) 
which is then used for the start of the next time 
increment. 

Such a time stepping scheme can then be used to provide 
a dynamic response of an assembly of deformable blocks. 
As the effective stiffness matrix is regular due to the 
presence of inertia terms, block separation can be accounted 
for without a need to add artificial springs to block centroid 
variables. The time stepping procedure can also be used to 
obtain a steady state solution through a dynamic relaxation 
process. The steady state solution can be obtained by the 
use of the so-called kinetic damping, that is, by simply 
setting all block velocities to zero at the beginning of every 
increment. 

It is obvious that every time integration algorithm adopted 
in the discrete element context (either for the combined 
finite/discrete element method or discontinuous deforma- 
tion analysis) introduces its own characteristic numeri- 
cal damping and dispersion through an apparent period 
elongation and amplitude decay (Chang and Acheampong, 
1993). An explicit central difference scheme (used almost 
exclusively with the FEM/DEM method) can be viewed 
as a special case of the generalized Newmark algorithm 
with B = 0, 8 = 0.5 and the collocation scheme (typically 
used in DDA) can also be seen as a special case of the 
generalized Newmark algorithm, but with B = 0.5, 8 = 1 
(Figure 25). From the analysis of spectral radius for the 
two recursive time integrators, it is clear that the above 
collocation scheme is associated with a very substantial 
numerical damping, which is otherwise absent in the cen- 
tral difference scheme. In addition, predictor-correctur or 
even predictor-multiple corrector schemes are adopted with 
granular materials (Anandarajah, 1999), in order to capture 
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Figure 25. Spectral radius and algorithmic damping for the DDA time integration scheme (generalized Newmark algorithm B = 0.5, 
è = 1.0, reproduced from Doolin DM and Sitar N. Time integration in discontinuous deformation analysis. J. Eng. Mech, ASCE 2004; 


130;249-258.) 


the high-frequency events, such as the collapse of arching 
or avalanching. 

The discrete element method simulations are computer 
resource intensive and therefore the need for the devel- 
opment of parallel processing procedures is clear. The 
explicit time integration procedure is a naturally concurrent 
process and can be implemented on parallel hardware in 
a highly efficient form (Ghaboussi, Basole and Ranjithan, 
1993; Owen and Feng, 2001). The nonlinearities that are 
encountered in this application can also be readily incor- 
porated. Much of the computational effort is devoted to 
contact detection and the search algorithms formulated prin- 
cipally for sequential implementation have to be restruc- 
tured for parallel computation. This can cause problems 
for distributed memory MIMD machines, as element-to- 
element contact affects considerably the data communica- 
tion between processors. Efficient solution procedures are 
employed to minimize communication requirements and 
Maintain a load-balanced processor configuration. 

Parallel implementations of DEM are typically made on 
both workstation clusters and multiprocessor workstations. 
In this way, the effort of porting software from sequential to 
parallel hardware is usually minimized. The program devel- 
opment often utilizes the sequential programming language 
with the organization of data structure suited to multi- 
processor workstation environment, where every processor 
independently performs the same basic DEM algorithm on a 
subdomain and only communicates with other subdomains 
via interface data. 

In view of the large mass of time-dependent data pro- 
duced by a typical FEM/DEM simulation, it is essential 
that some means be available of continually visualizing the 
solution process. Visualization is particularly important in 
displaying the transition from a continuous to a discontin- 
uous state. Visual representation is also used to monitor 
energy balance, as many discretization concepts both in 
space (stiffness, mass, fracturing, fragmentation, contact) 
and time can contribute to spurious energy imbalances (see 
Chapter 5, Volume 2). 


8 ASSOCIATED FRAMEWORKS AND 
DEVELOPMENTS 


This section briefly addresses some of the methods that 
‘are used to conduct an analysis of solids and structures 
with existing or emerging discontinuities, without neces- 
sarily accounting for the full separation of structural parts. 
Although such methods were initially developed as special 
cases of nonlinear continuum models, it may be argued 
that they belong to the wider category of discrete element 
methods, as they fundamentally address the same physical 
problem. 
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Ina continuum setting, the preexisting macro discontinu- 
ities are typically accounted for through the use of interface 
(or joint) elements (Rots, 1988), which may be used to 
model crack formation as well as shearing at joints or pre- 
determined planes of weakness. Joint planes are assumed 
to be of a discrete nature, their location and orientation is 
either given, or it becomes fixed once they are formed. The 
term discrete cracking was typically adopted, as opposed 
to the term smeared cracking, where the localized failure 
at an integration point level is considered in a locally aver- 
aged sense. The crack initiation across the interface element 
is typically driven by the tensile strength of the material 
(tension cutoff), and the gradual reduction of strength is 
modeled by some form of softening law, indicating a rela- 
tionship between the tension stress across the crack and 
the crack opening, which may include a softening law 
usually controlled by the fracture energy release rate (see 
Chapter 10 and Chapter 11 of Volume 2). Similarly, in 
modeling shear response, the Coulomb criterion is usually 
adopted, with the gradual decohesion at the interface, Nat- 
ural extension of the above simple concepts leads to the 
interface material models that account for a combination 
of cracking and Coulomb friction, which have been for- 
mulated as two-parameter failure surfaces in the context of 
computational plasticity. 

The rigid bodies spring model, RBSM (Kawai, 1977), 
was earlier proposed as a generalized limit plastic anal- 
ysis framework. The discrete concept of a discontinuity 
is present as well, but the deformability of the material 
between the discontinuities is ignored. Structures are mod- 
eled as assemblies of rigid blocks connected to one another 
by normal and tangential springs on their interfaces, The 
purpose of the analysis is to start from a state where the 
structure is represented by a connected assembly of rigid 
blocks and where the progressive failure is modeled by 
developing discontinuities, through emerging cracks and 
slipping at the rigid blocks interfaces. 


Q2 


Figure 26. Two rigid blocks with an elastic interface contact in 
RBSM. 
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In the RBSM, the stiffness matrix is obtained by consid- 
ering rigid bodies to be connected by distributed normal 
and tangential springs of the values k, and k, per unit 
length. The rigid displacement field within an arbitrary 
block (Figure 26) is expressed in terms of the centroid dis- 
placements and rotation (u, v, 67. 

The constitutive relation between the traction compo- 
nents and the relative displacement at the location P in 
plane stress can be written as 


o=D8 o= [op Ts] (18) 
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where A is the sum of shortest distances h, and h, between 
the two block centroids to the contact line. 

Generalization to 3-D situation is conceptually straight- 
forward, and the method can clearly be interpreted as a 
similar method to the finite element method with joint 
(interface) elements of zero thickness, the only difference 
coming from the assumption that the overall elastic behav- 
ior is represented only by the distributed stiffness springs 
along interfaces. The RBSM has been successfully used in 
the nonlinear (failure) analysis of the plain concrete and 
reinforced concrete structures. 

There are several other modeling frameworks that fall 
into the category of discrete element methods, for exam- 
ple, modified distinct element method, MDEM (Meguro 
and Tagel-Din, 1999). The non smooth contact dynamics 
method, NSCD (Moreau, 1999; Jean, 1999; Jean, Acary and 
Monerie, 2001), is related to both the combined FEM/DEM 
and the DDA, but it comprises significant differences, 
as the unilateral Signorini condition and the dry friction 
Mohr-Coulomb are adopted without resorting to smooth 
regularization. In the context of multiple bodies contact (i.e. 
fragmentation or granular flow), with discontinuities in the 
field of velocities, NSCD follows the argument that it is 
not possible to define the acceleration as a usual derivative 
of a smooth function. Instead of using the regularized form 
of dynamic equations of equilibrium, the nonsmooth time 
discretized form of dynamic equations is obtained by inte- 
grating the dynamic equation, so that the velocities become 
the primary unknowns 


frat 
Mis. T -a= f F(t, x, ee 


fal 
si = Xi +f” za 


(20) 


Another feature of the NSCD is that the force impulse 
on the RHS is split into two parts, where the first part 
Jit" F(t,x,ž)dt (which excludes the contact forces) is 
considered continuous, whereas the second part Si fy dt 
(representing contact forces contribution to force impulse) 
is replaced by the mean value impulse rı = (1/49 
fi, t+ vdt over the finite time interval. In physical terms, 
this implies that the full contact interaction history is only 
accounted for in an averaged sense over the time interval, 
discarding the details, which are either deemed impossi- 
ble to characterize owing to insufficient data available, or 
are deemed inconsequential in terms of having a significant 
effect on the overall solution. 

Different time stepping algorithms are possible depend- 
ing on the approximation adopted and on the nature of 
the constitutive model for the block material. A low-order 
implicit time stepping scheme, typically the backward Euler 
scheme is used. Resolution for contact kinematics then con- 
siders the relationship between the relative velocities of 
contacting bodies and the mean value impulse at discrete 
contact locations 

Spot = Xyettree + Kog Atria (21) 
where the first part represents the ‘free’ relative velocity 
(without the influence of contact forces) and the second part 
comprises ‘corrective’ velocities emanating from contacts. 
The actual algorithm is realized in a similar manner to the 
closest point projection stress return schemes in computa- 
tional plasticity — a ‘free predictor’ for relative velocities is 
followed by iterations to obtain ‘iterative corrector’ values 
for mean value impulses, such that the inequality constraints 
(Signorini and Mohr—Coulomb) are satisfied in a similar 
manner as the plastic consistency condition is iteratively 
satisfied in computational plasticity algorithms. The admis- 
sible domains in contact problems are generally nonconvex 
and it is argued that it is necessary to treat the contact 
forces in a fully implicit manner, whereas other forces can 
be considered either explicitly or implicitly (Kane et al., 
1999), leading to either implicit/implicit or explicit/implicit 
algorithms. 

Furthermore, there is a degree of commonality of novel 
ideas in terms of describing the block deformability in 
discrete element methods and novel developments in the 
continuum-based techniques. The manifold method (Shi, 
1997; Chen, Ohnishi and Ito, 1997) advocates similar ideas 
as the ones advocated in the meshless (Belytschko et al., 
1996) or the partition of unity (Melenk and Babuška, 1996) 
methods (see Chapter 10, Volume 2). Similar to the mesh- 
less methods (see Chapter 10, this Volume), the manifold 
method identifies the cover displacement function Cy and 
the cover weighting function w,, where the geometry of 


the actual blocks 9; is utilized for numerical integration 
purposes over the background grid. The treatment of any 
emerging discontinuities is envisaged by introducing the 
concept of effective cover regions, where there is a need 
to introduce n independent covers, if a cover intersects 2 
disconnected domains. These concepts point to a range of 
possibilities in the simulation of progressive discontinuities 
in quasi-brittle materials. 

Discontinuous modeling frameworks are also increas- 
ingly moving toward formulations and applications in mul- 
tifield and multiphysics problems, in particular, in the 
area of the coupled fluid flow in discontinuous, jointed 
media. It is believed that discontinuous modeling frame- 
works have a bright and exciting future, especially in the 
context of fragmentation and in the microscopic simula- 
tion of the behavior of heterogeneous materials, where the 
notion that simple constitutive laws at the micro, meso, 
or nano level generate manifestations of complex macro- 
scopic behavior, such as plasticity or fracture (Cundall, 
2002), is fundamentally different to the top-down approach 
of the nonlinear FEM. Increased computing power and 
efficient contact detection algorithms will not only allow 
modeling of progressive fracturing, including fragmented 
state, but will also allow for the further development 
and enhancement of discrete microstructural models of 
material behavior where the continuum concept may be 
abandoned and an internal length scale may be intrin- 
sically incorporated into the model. Moreover, the large 
scale simulations with adaptive multiscale material mod- 
els, where different regions or domains are accounted for 
at a different scale of observation are governed by noncon- 
tinuum laws of physics, seem possible in a not-too-distant 
future. 
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1 INTRODUCTION 


In essence, the boundary element method (BEM) may 
be considered as an application of finite element method 
(FEM), designed originally for the numerical solutions 
of partial differential equations (PDE) in the domains, to 
the boundary integral equations (BIE) on closed boundary 
manifolds. The terminology of BEM originated from the 
practice of discretizing the boundary manifold of the 
solution domain for the BIE into boundary elements, 
resembling the term of finite elements in FEM. As 
in FEM, in the literature, the use of the terminology 
boundary element is in two different contexts: the boundary 
Manifolds are decomposed into boundary elements, which 
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are geometric objects, while the boundary elements for 
approximating solutions of BIEs are actually the finite 
element functions defined on the boundaries. Looking 
through the literature, it is difficult to trace back one 
fundamental research paper and the individuals who were 
responsible for the historical development of the BEM. 
However, from the computational point of view, the work 
by Hess and Smith deserves mention as one of the 
cornerstones of BEM. In their 1966 paper (Hess and Smith, 
1966), boundary elements (or rather surface elements) have 
been used to approximate various types of bodies and to 
calculate the potential flow about arbitrary bodies. On the 
other hand, the paper by Nedelec and Planchard (1973) 
may be considered as a genuine boundary element paper 
with respect to the variational formulation of BIEs. Other 
early contributions to the boundary element development 
in the 1960s and 1970s from the mathematical point 
of view include Fichera (1961), Wendland (1965, 1968), 
MacCamy (1966), Mikhlin (1970), Hsiao and MacCamy 
(1973), Stephan and Wendland (1976), Jaswon and Symm 
(1977), LeRoux (1977), Nedelec (1977), and Hsiao and 
Wendland (1977), to name a few. 

The BEM has received much attention and gained wide 
acceptance in recent years. From 1989 to 1995, the German 
Research Foundation DFG installed a Priority Research 
Program ‘Boundary Element Methods’, and the final report 
appeared as a book (see Wendiand, 1997). There has been 
an increasing effort in the development of efficient finite 
element solutions of BIEs arising from elliptic bound- 
ary value problems (BVP). In fact, nowadays, the term 
BEM denotes any ‘efficient method’ for the approximate 
numerical solution of these boundary integral equations. 
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One of the distinct features of the method is that the approx- 
imate solution of the boundary value problem via BEM 
will always satisfy the corresponding PDEs exactly in the 
domain and is characterized by a finite set of parameters 
on the boundary. 

As the classical integral equation method for numer- 
ical solutions to elliptic BVPs, central to the BEM is 
the reduction of BVPs to the equivalent integral equa- 
tions on the boundary. This boundary reduction has the 
advantage of diminishing the number of space dimen- 
sion by 1 and the capability to handle problems involv- 
ing infinite domains. The former leads to an appreciable 
reduction in the number of algebraic equations generated 
for the solutions, as well as a much simplified data rep- 
resentation. On the other hand, it is well known that 
elliptic BVPs may have equivalent formulations in var- 
ious forms of BIEs. This provides a great variety of 
versions for BEMs. However, irrespective of the vari- 
ants of the BEMs and the particular numerical implemen- 
tation chosen, there is a common mathematical frame- 
work into which all these BEMs may be incorporated. 
This chapter addresses the fundamental issues of this 
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Figure 1. A schematic procedure for boundary element methods. 


common mathematical framework and is devoted to the 
mathematical foundation underlying the BEM 
techniques. 

Specifically, this chapter will give an expository intro- 
duction to the Galerkin-BEM for elliptic BVPs from the 
mathematical point of view. Emphasis will be placed on 
the variational formulations of the BIEs and the general 
error estimates for the approximate solutions in appropri- 
ate Sobolev spaces. A classification of BIEs will be given 
on the basis of the Sobolev index. The simple relations 
between the variational formulations of the BIEs and the 
corresponding PDEs under consideration will be indicated. 
Basic concepts such as stability, consistency, and conver- 
gence, as well as the condition numbers and ill-posedness, 
will be discussed by using elementary examples. 

Figure 1 is a sketch of the general procedure for approx- 
imating the solutions of a BVP via the boundary element 
methods. In the remaining sections, we will discuss the 
topics by following the procedure up to including the 
asymptotic error estimates and end the chapter with further 
reading materials with reference to other topics that are not 
included here because of the limitation of the length of the 
chapter. 


2 BOUNDARY INTEGRAL EQUATIONS 


In this section, basic BVPs in elasticity will be presented as 
the mathematical model problems to illustrate the general 
procedure for the BEM given in the introduction. We begin 
with the reduction of boundary value problems to various 
boundary integral equations. The concepts concerning fnn- 
damental solutions, Green’s representation formula, Cauchy 
data, and four basic boundary integral operators will be 
introduced. Various numerical schemes for the derived 
BIEs will be discussed, and the corresponding algebraic 
equations will be formally obtained in terms of boundary 
elements. 

These model problems in elasticity are particularly edu- 
cational in the sense that solutions of the equivalent BIEs 
are not always unique, even though the original BVP 
is uniquely solvable. As will be seen, the rigid motions 
will play an important role in circumventing this diffi- 
culty. Throughout the chapter, Q C R", n = 2 or 3, denotes 
a bounded domain with smooth boundary T and Q° = 
R” \ Q, the exterior domain. 


2.1 Boundary value problems 


In linear elasticity for isotropic materials, the governing 
equations are 


-AU=f in Q(o 2) a) 
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where 


A*U := pAU + (p + 2) grad div U 


is the Lamé operator, U the unknown displacement field, 
and f a given body force. Here, p and are given constants 
such that p > 0 and } > —(2/n) u. These are so-called 
Lamé constants, which characterize the elastic material. 
We consider four fundamental BVPs, the interior and exte- 
rior displacement, and traction problems. The displacement 
problem is a BVP of the Dirichlet type in which the bound- 
ary displacement 


Up =o oT (2) 


is prescribed, while the traction problem is of the Neumann 
type with the traction boundary condition 


TUp = Yon 6) 


Here, T is the traction operator defined by 


ðU 
TUy := (xai U) + {ea + jun x curl v) (4) 


Y 


for n = 3, which reduces to the case for n = 2 by setting 
U; =0 and the third component of the normal n, = 0. 
Here, and in what follows, n is always the exterior unit 
normal to Q. For the exterior BVPs, we require, as usual, 
appropriate growth conditions, which will be specified later. 

To reduce the BVPs to BIEs, one needs the knowledge 
of the fundamental solution E(x, y) for the PDE (1), a 
distribution satisfying 


—A*E(x, y) = (jx — yl) 


with the Dirac-8 function and identity matrix 7, which can 
be derived by using the Fourier transformation. A simple 
calculation shows that 


A +3u 

4n(n — Dp + 2p) 

A+ 

A+ 3p |x — y|” 


E(x, y) = 


x þe, y+ (x - y) yl 


(5) 
a matrix-valued function, where 


—log|x—y| forn =2 
va, y)= 4 _ 1 
lx- yl 


forn =3 


(In fact, these are the fundamental solutions for the Lapla- 
Cian.) In the so-called direct approach for the BEM, the 


BIEs are based on the Green representation formula, which 
in elasticity is termed the Betti-Somigliana representation 
formula for the solutions of the BVPs under consideration. 
For interior problems, we have the representation 


UG) = [ E(x, y)TU(y) ds, — Í (7, E(x, y)TUO)ds, 


a [ E(x, y)f() dy 6 


for x € &. The subscript y in TE (x, y) denotes differen- 
tiations in (6) with respect to the variable y. 

The last term in the representation (6) is the volume 
potential due to the given body force f defining a particular 
solution u, of (1). For linear problems, we may decompose 
the solution in the form 


U=u,+u 


where u now satisfies the homogeneous equation (1) with 
f = 0 and has a representation from (6) in the form 


u(x) = Vo(x) -We(x), xEQ _ C) 


Here, V and W are respectively the simple- and double- 
layer potentials defined by 


Vo(x) = f BG, yo (y) ds, 
Woe) = f T, NE DT Ods, 


for x € Q; and the boundary charges p(x) = u(x) ox) = 
Tu(*)\r are the Cauchy data of the solution u to the 
homogeneous equation 


Atu=0 inQ (8) 


Because of the above decomposition, in the following sec- 
tions we shall consider, without loss of generality, only the 
homogeneous equation (8). 

For exterior problems, the representation formula for u 
needs to be modified by taking into account the growth con- 
ditions at infinity. The appropriate growth conditions are 


u(x) = —E(x, 08 + w(x) + O({x|!-") as |x| > 00 


Q) 

where w(x) is a rigid motion defined by 
_ fatb(—x,x,)7 for n = 2 10 
atelier for n =3 90) 


Here, a, b, and E are constant vectors related to the 
translation, rotation, and total boundary forces respectively. 
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The total surface forces E are generally specified and are 
related to the rigid motion w. The latter may or may 
not be specified, depending on the dimension and the 
problems under consideration. The representation formula 
for solutions of 


A*u = 0 in 2° (11) 
together with the growth condition (9) now has the form 
u(x) = —Vo(x) + We(x) + olx) (12) 


for x € O°, with p = up ando = Tur being the Cauchy 
data and 


z= fods (13) 


We remark that (7), (12), and (13) are the basic repre- 
sentations for the interior and exterior problems in the 
direct approach. Alternatively, there is a so-called indirect 
approach; one seeks the solutions of BVPs in the form of 
either simple- or double-layer potentials. Then the unknown 
boundary charges (or densities) are no longer the Cauchy 
data but the jumps of the corresponding Cauchy data across 
the boundary. Nevertheless, one may arrive at similar BIEs 
as in the direct approach but with a simplified given right- 
hand side, which, in contrast to the direct approach, may or 
may not always be in the range of the corresponding bound- 
ary integral operators. Moreover, desired physical quantities 
such as the Cauchy data need to be calculated from the 
density functions in some postprocessing procedure. 


2.2 Basic boundary integral operators 


We begin with the interior problems. Applying the trace 
and the traction operator T in (4) to both sides of the 
representation formula (7), we obtain the overdetermined 
system of BIEs on T 


o = (}1— K)o + Vo (14) 
o = Do + (4I + K')o (15) 


for the Cauchy data @, o of the solution u. Here, V, K, Ke, 
and D are the four basic boundary integral operators, 
which are generally known respectively as the simple-layer, 
double-layer, transpose of the double-layer, and hypersin- 
gular boundary integral operators in potential theory. They 
are defined by 


Vo(x) = lim. Vo(z) (16) 


<<. i i 
Ko) = pe W(z) + 59(*) (17) 


eee = 
K'o(x) = tim, T,Vo(z) — 50(x) (18) 
Dex) := — oolim T,W (2) (19) 


for x e T, where, in the neighborhood of I’, the traction 
operator T, is defined by R 


T,v(z) := [à (div, v(z)) + 24 (grad; v(z))] - 0, 
+pn, x (curl,v(z)) 


for z € Q or z € 2°. From these definitions, we have the 
following classical results. 


Lemma 1. Letl € C? and let ọ € C* (T), o € C*(L) with 
0 <a < 1. Then the limits in (16) to (18) exist uniformly 
with respect to all x eF and all @ and o with lelle < 
1, Jo ilçe < 1. These limits can be expressed by 


Vo(x) = f E(x, y)o(y) ds, x Er (20) 
ye ix} 


Ko) =p.v. f (T,E(x, y) 9) dsy, xer 
yer\tz} 
21) 
ane y)o(y)ds, xer 
(22) 


K'o(x) = pv. f 


yery 


If, in addition, ọ is Hölder continuously differentiable, then 
the limit in (19) exits uniformly with respect to all x € T 
and all @ with \|@ll cise < 1. This limit can be expressed by 


Dud =- tim, T Í (T,E(,»)) toy) — @@I ds, 


Qaz>xer 


= -pv f 1, £6; DIO) = elds, 
03) 


Here and in the sequel, we denote by 
CHT) i= fu e C™(D)|lhullomary <} 


the Hölder m-continuously differentiable function space 
equipped with the norm 


lullomar = >, sup |dv@)| 


[Bis *© 
(dfu) — Pvy). 


=m a,yeruxy y= ye 


+ 
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for m € No, the set of nonnegative integers, and 0 < a < 1, 
where 8° denotes the covariant derivatives 


aP sa aft... abr 


on the (7n — 1)-dimensional boundary surface T and B € 
Ni-! denotes the multi-index with |B] = 4 +--+- + Bp-i 
(see Millman and Parker, 1977). 

The operator V has weakly singular kernel, K and K’ 
have Cauchy-singular kemels, whereas D has a noninte- 
grablc kernel with singularity of the order O(|x — y|”). 
The integral in (20) is a weakly singular improper integral, 
whereas the integrals in (21) and (22) as well as in (23) 
are to be defined as Cauchy principal value integrals, for 
example, 


an f (T, E(x, »))" o0) ds, 
yer} 


= lim (7, E(x, y))" 9) ds, 


£90 Jiy_x]>e>0ayer 
If we apply the representation formula to any constant 


vector field a, which represents a rigid displacement, then 
we obtain 


a= - [ze y ads, forzeQ 
E 
which yields for z € Q in the neighborhood of T 


T Í (TEG, y))7 ds, @(x) = 0 


Hence, 


Dos) = lim T, f TEG DTO) - ds, 


from which (23) results. For a proof of Lemma 1, see 
Mikhlin (1970). In fact, for T € C? and O<a<1, a 
fixed constant, these boundary integral operators define 
continuous mappings between the foliowing spaces: 


v: C*) > CT) 
K, K':C*T) > C'T), CHT) > CT) 
D: CT) > CT) 
(see Mikhlin (1970) and Mikhlin and Prössdorf (1986)). 


We remark that from the overdetermined system of BIEs 
(14) and (15) for any solution of (8), the Cauchy data ọ, o 


on T are reproduced by the matrix operator on the right- 
hand side of (14) and (15), namely, 


lye 
c= (7 K Vv ) 
D I+ K’ 


This suggests that this matrix operator Cg is a projector. 
Indeed, this is the case and the matrix operator Co is the 
so-called Calderon projector. From the mapping properties 
of the corresponding boundary integral operators, we have 
the following basic result. 


Theorem 1. Let T € C?. Then, Cg maps C*(T) x C)+ (T) 
into itself continuously and Co is a projector in the sense 
that 


C=C 


As a consequence of Theorem 1, we have the following 
identities: 


—lypo— K2 —1ly_ x? 
ea a 
for the four boundary integral operators. We remark that 
these relations are extremely valuable from both theoreti- 
cal and computational points of view. In particular, V and 
D are pseudoinverses to each other and may serve as pre- 
conditioners in the corresponding variational formulations 
(see Steinbach and Wendland, 1998). The Calderon projec- 
tor leads in a direct manner to the basic BIEs for the BVPs 
under consideration. 

Now for the exterior problems, we begin with the repre- 
sentation (12) and obtain, in the same manner, the system 
of BIEs on T: 


o=(1+K)p—-Vo+o (25) 
o =—Do+(5l-K’')o (26) 


for the Cauchy data », o of the solution u € 2°. Here we 
have used the fact that the four basic boundary integral 
operators V, K, K’, and D are related to the limits of the 
boundary potentials from the exterior domain Q°, similar 
to (16) to (19), namely, 


yeu es Yok) en 
Ko@) = lim p We- ło) (28) 


K'o(x) =, lim p LVO +z) 029) 
Dọ&)=- lim __T,We(2) (30) 


Q6az->xel 
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for x € I’. Note that in (28) and (29), the signs at (1/2) p(x) 
and (1/2)o0(x) are different from those in (17) and (18) 
respectively and that the latter provide us the so-called jump 
relations of the boundary potentials across the boundary. 

For any solution u of (11) in Q° with œ = 0, we may 
define similarly the Calderon projector Co: for the exterior 
domain 9°. There holds the relation 


Cæ =I — Co 


where J stands for the identity matrix operator and Theo- 
rem 1 remains valid for Cp. as well. 


2.3 Boundary integral equations 


The interior displacement problem consists of the Lamé 
equations (8) in Q together with the prescribed Dirichlet 
condition (2), namely, 


ulp =o onP 61) 


where we have tacitly replaced ọ by p= 6 — Uplr. The 
missing Cauchy datum on T is the boundary traction o = 
Tu. For the solution of the interior displacement problem, 
we may solve either (14), the Fredholm BIE of the first kind 


Vo=(41+K)o onr (32) 


or (15), the Cauchy-singular integral equation (CSTE) of the 
second kind 


($I — K')o = De onr (33) 


Both are BIEs for the unknown o. 

For n = 2, the first-kind integral equation (32) may have 
eigensolutions for special choices of I’. To circumvent this 
difficulty, we can modify (32) by including rigid motions 
(10). More precisely, we consider the system 


1 
vo-a=(Z1+K)o onf and f ods=0 
T 


(34) 
together with 


[eon +0,x,)ds=0 forn =2 
or E 
fe x (x1, X2, X3) )ds =0 forn=3 (35) 


where o and œ (see 10) are to be determined. As was 
shown in Hsiao and Wendland (1985), the rotation b (b 
for n = 2) in (10) can be prescribed as b=0 (b =0 


for n = 2); in this case, the side conditions (35) will not 
be needed. In fact, for n = 3, many more choices in w 
can be made (see Hsiao and Wendland (1985) for the 
details). Now, given ọ € C!+*(), the modified system 
(34), (35) is always uniquely solvable for ø € C*(T) in 
the Hélder space (for n = 2, the analysis is based on, for 
example, Muskhelishvili (1953), Fichera (1961), and Hsiao 
and MacCamy (1973)). 

For the special CSIE of the second kind (33), Mikhlin 
(1962) showed that the Fredholm alternative — originally 
designed for compact operators — remains valid here. 
Therefore, (33) admits a unique classical solution o € 
C°(T), provided ọ € CHT), 0 <a <1; see Kupradze 
et al. (1979) and Mikhlin and Préssdorf (1986). 

The interior traction problem consists of the Lamé sys- 
tem (8) in Q together with the prescribed Neumann condi- 
tion (3): 


Tulp=o onl (36) 


where again we have replaced Y by o =  — Tu,|p. The 
missing Cauchy datum is now @ = uly in the overdeter- 
mined system (14) and (15). Again, we can solve for ọ 
from either (14) 


1+K)p=Vo onr (37) 
or from (15) 
Do=(hI-K’)o or (38) 


with given o. As is well known, similar to the Neumann 
problem for the Laplacian, the given traction ø needs to 
satisfy equilibrium conditions for a solution of the interior 
traction problem to exist. These can be obtained from the 
second Green formula, which is known as the Betti formula 
in elasticity. The equilibrium condition reads 


[o-cas=0 (39) 
r 


for all the rigid motions given by (10). Or, equivalently, 
from the definition of œ in (10), this means that ø should 
satisfy 


a: f oas +b f Cox, +0,%,)ds =0 fo n =2 
or r r 


a- fods+b- f @,%.%)" xods=0 forn =3 


for arbitrary b € R, a and b € R”. This condition also turns 
out to be sufficient for the existence of @ in the classical 
Hélder-function spaces. If ø € C*(T) with O< a <1 is 
given satisfying (39), then the right-hand side Vo in (37) 
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automatically satisfies the orthogonality conditions from 
Fredholm’s alternative, and the CSIE (37) admits a solution 
o € C**(). However, the solution is unique only up to 
all rigid motions w, which are eigensolutions. For further 
details, see Kupradze et al. (1979). 

The hypersingular integral equation of the first kind (38) 
also has eigensolutions, which again are given by all rigid 
motions (10). However, the classical Fredholm alternative 
also holds for (38); and the right-hand side [(1/2)J — K’]o 
in (38) satisfies the corresponding orthogonality conditions, 
provided ø € C*(I) satisfies the equilibrium conditions 
(39). In both cases, the integral equations, together with 
appropriate side conditions, can be modified so that the 
resulting equations are uniquely solvable. 

For the exterior displacement problem, u satisfies the 
Lamé system (11) in 2°, the Dirichlet condition (31), and 
the decay condition (9) at infinity with given total surface 
forces £. For simplicity, we consider here only the case in 
which the rigid motions œ are unknown. For other cases, 
we refer the interested reader to the work in Hsiao and 
Wendland (1985). For the present simple situation, both 
the tractions ø = Tul, and the rigid motions w are now 
the unknowns. We may solve the problem by using either 
(25) or (26). However, for both equations, in addition to the 
given total surface force as in (13), we need to prescribe 
additional normalization conditions, for example, the total 
moment condition due to boundary traction as in (35), in 
order to have the uniqueness of the solution. This yields 
the following modified BIE of the first kind from (25): 


Vo-o=-—($/—K)p onr (40) 


f ods = (41) 
r 
together with the additional normalization conditions 


[com +0,x,)ds=0 forn=2 
or a 
fe X (Xp x2 x3) )ds=0 forn=3 (42) 
p 


where pe CHĦ(T) and E € R",n = 2,3 are given, but 
o and w are to be determined. It can be shown that (40) 
together with (41) and (42) always has a unique solution 
o € C*(T) and w, the rigid motion in the form of (10). 

On the other hand, from (26) we have the singular 
integral equation of the second kind 


GI+K)o=—-De onl (43) 


with the additional equations (41) and (42). Note that the 
operator (1/2) + K' is adjoint to (1/2)1 + K. Owing to 


Mikhlin (1962), for these special operators, the classical 
Fredholm alternative is still valid in the space C%(I). 
Since ((1/2)I + K)@ = 0 on T for all rigid motions w, 
the corresponding homogeneous equation (43) also has a 
3(n — 1)-dimensional eigenspace. Moreover, Dw = 0 for 
all rigid motions; hence, the right-hand side of (43) always 
satisfies the orthogonality conditions for any given 9 € 
C!+*(T). This implies that equation (43) always admits a 
solution ø € C*(T). With the prescribed total force (41) 
and total moment due to traction (42), additionally, these 
algebraic equations determine o uniquely. 

Finally, for the exterior traction problem, the Neumann 
datum is given by (36) and so are the total forces (41) 
and total moment such as (42), although here the total 
moment may or may not be equal to zero, depending on 
the given traction ø on I’. The rigid motion œ is now an 
additional parameter, which can be prescribed arbitrarily 
according to the special situation. Often, w = 0 is chosen 
in the representation (12) as well as in the decay condition 
(9) for the exterior traction problem. Then, from (25) we 
have the Cauchy-singular BIE for ọ = uly: 


(41-K)g=-Vo+o onl (44) 


As is well known, for any given ø € C*(T) and given 
@, the equation (44) is always uniquely solvable for 9 € 
C!+*(r); we refer the reader to Kupradze (1965) for 
details. 

We may also solve the exterior traction problem by using 
(26). This leads to the hypersingular BIE of the first kind, 


Do=—-(hI+K')o ont (45) 


It is easily seen that rigid motions w on I are eigensolutions 
of (45). Therefore, in order to guarantee unique solvability 
of the BIE, we modify (45) by including additional restric- 
tions and adding more unknowns, for example, 


3(n—1) 
1 
Doo(x) + > qm, (x) = - (Gr + x’) o(x) forxer 
f=1 


and 
f me0)- e019) 48 = 0, £=1,...,31-1) (46) 
r 


where m(x) denote the basis vectors for the rigid motions 
w defined by (10). Here the added unknown Lagrangian 
multipliers œ are introduced in order to have the same 
number of unknowns as that of the equations. It is easy to 
see that for the Neumann problem as the exterior traction 
problem, they all vanish because of the special form of 
the right-hand side of (46). This system is always uniquely 
solvable, and for any given ø € C*(T), we find exactly one 
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Po E CHAT). Once @o is known, the displacement field 
u(x) in Q° is given by the representation 


u(x) = Vo (x) + Wp) forx € 2° (47) 


Note that the actual boundary values of u|, may differ from 
o by a rigid motion and can be expressed via (47) in the 
form 


u(x)|p = Gi +K)oox)- Vox) forxeD 


In summary, we see that for each of these fundamental 
problems, the solution may be obtained by solving the BIE 
of either the first kind or the second kind. Moreover, the 
unique solvability of these BIEs can be accomplished by 
including appropriate normalization conditions based on the 
rigid motions. 


2.4 Boundary elements and numerical schemes 


We observe that all the BIEs derived previously can be 
formally written in the form 


Av + Ba=q 
Av=c (48) 


Here, v and « denote the unknown Cauchy datum and 
real vectors, while q and c are the given data on the 
boundary and constant constraint vectors respectively, A 
is a boundary integral operator, B is a matrix of functions, 
and A consists of appropriate functionals. We now discuss 
the numerical solutions of (48) by the boundary elements. 

For the numerical treatment of BIEs (48), one needs a dis- 
cretization of the boundary charge functions v. In the BEM, 
trial functions are chosen as finite elements on the bound- 
ary I", the boundary elements. In general, the boundary T 
is assumed to be given by local parametric representations 
such that a family of regular partitions in the parametric 
domains is mapped onto corresponding partitions on I. On 
the partitions of the parametric domain, we use a regular 
family of finite elements as introduced in Babuška and Aziz 
(1977). The parametric representation of I then transfers 
the finite elements onto the boundary T defining a family 
of boundary elements. However, in practice, or if the local 
parametric representations of I are not known explicitly, 
an approximation T, of the boundary surface I is often 
used. The approximated boundary I’, is composed of piece- 
wise polynomials associated with a family of triangulations 
{t, h= of T, where T= UÉ T, and A := max), p(t): 
The degree of the piecewise polynomials is chosen in cor- 
respondence to the order of the Sobolev spaces for the 


solutions of the BIEs. In this connection, we refer the reader 
to the work by Nedelec (1976) and also Dautry and Lions 
(1990). 

There are three main numerical schemes for treating the 
BIEs. These are collocation, Galerkin’s method, and the 
least squares method. For the BIE (48), these methods can 
be described as follows. Let S, denote a family of finite- 
dimensional subspaces of the solution space for v based on 
the triangulation. Let w(x), ..., Wy (x) be a basis of S, 
for fixed h. We seek an approximate solution pair {v}, œ} 
in the form 


N 
V,CZ) = Yo yE) ERI 49) 
j=1 


with unknown coefficients y; and vector œ, to be deter- 
mined by following a system of algebraic equations gener- 
ated by these methods: 


(1) The collocation method. A suitable set of collocation 
points x, C I, k = 1, ..., N is chosen and the approximate 
solution (49) is required to satisfy the collocation equations 


N 
So yana + Baa =E k=l. N 
j=l : 

N 

Y yAn =e (50) 


j=l 


(2) The Galerkin method. The approximate solution (49) 
is required to satisfy the Galerkin equations 


N 


Soy (Any, ha) + Boy, b= Gr), k=l, N 
j=l 


N 
PvA =e (51) 
ra 

where the brackets (+, -) denote the L?(I")-inner product, 


(0,9) := TEOL 


where @ denotes the complex conjugate of . For the 
equations (46), for instance, the equations (51) correspond 
to the mixed formulation of saddlepoint problems. As will 
be seen, the Galerkin equation (51) represents, in some 
sense, the discrete weak formulation of (48) in the finite- 
dimensional subspace S,,; the precise definition will be 
made clear in the next two sections. 
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(3) The least squares method. The approximate solu- 
tion (49) is determined from the condition 


2 2 


N 
DiyjAn; -e 
j=l 


+ 
Lr) 


= min! 


N 
Soy Any + Bo, -q 
jal 


or, equivalently, from the linear system of algebraic 
equations 


N N 
YO yAn, Ap) + (Bo, Any) + YOYA Ap, 
j=l j=l 
= (Q, Ap) +e: App k=l, N 
e ! 
Yyj(Aw;, BT) + (Bap, BT) = (q, B7) 
j=1 


This completes the tutorial part of our introduction to the 
BEM. Needless to say, the solvability of these linear sys- 
tems as well as the error estimates for the approximate 
solutions depend heavily on the properties of the boundary 
integral operators involved, and all the formal formulations 
can be made precise after we introduce appropriate mathe- 
matical tools. We will return to these discussions later. 


3 VARIATIONAL FORMULATIONS 


To discuss the Galerkin-BEM, we need the concept of weak 
or variational solutions of the BIEs in the same manner as 
that of the PDEs. This section will be devoted to the above 
topic by using a simple Dirichlet problem for the Laplacian 
(instead the Lamé) operator to illustrate all the essential 
features in the variational formulations, motivating from 
the weak solution of the PDEs to that of the BIEs. Our 
analysis is general enough and can immediately be applied 
to elasticity without any severe modifications (see Hsiao 
and Wendland, 1985) (see Chapter 4, this Volume). 


3.1 Weak solutions 


To introduce the concept of weak solutions for the BIEs, 
let us begin with the variational formulation of the simplest 
elliptic BVP, the Dirichlet problem for Poisson’s equation, 


-Au=f inQcR andulp=0 onl (52) 


where f is a given function satisfying certain regularity 
Conditions to be specified. Multiplying both sides of the 


equation — Au = f by a smooth function v and integrating 
by parts, we obtain 


[ Vu(x) - Vo(x) dx = f fow dx (53) 
Q Q 


provided vj, =0. The integral identity (53) suggests that 
one may seek a solution of the Dirichlet problem (52) from 
(53) when the classical solution does not exist, as long as 
the integrals in (53) are meaningful. Indeed, this leads to 
the concept of the variational formulations of BVPs and the 
weak solutions for partial differential equations. To make it 
more precise, let us introduce the function space 


H (Q) = {v € L(Q)|Vv € LQ) 


equipped with the norm 


1/2 
iulm ( Í væ)? + voaza) 


the Sobolev space of the first order, which is a Hilbert space 
with the inner product 


CA E 7= [oom V(x) Vw(x)} dx 
We denote by 
Hy(Q) = {v € H’ (9) | vip = 0} 
the subspace of H! (Q) with homogeneous Dirichlet bound- 
ary conditions. We denote the dual space of H!(Q) by 


H! (Q), while we denote the dual space of Hi (2) by 
H! (Q). The corresponding norms are defined by 


5 
WF la) = sup Kf v) 
veH} (2) [lll zrtcay 
v0 
and 
KA vii 


IF late) = sup 
veth) lvolao 
v#0 


F~ (Q) and H-}(Q) contain all the bounded linear func- 
tionals on H1(Q) or H (N) respectively. We adopt the 
notion (-,-) for the L?-duality pairings between A~1(Q) 
and H1(Q), and H-!(Q) and HÈ (Q) respectively; if f € 
L7(Q), we have simply (f, v) = (f, Vro) for all v € 
H! (Q). With these function spaces available, we may now 
give the precise definition of a weak solution of (52). Given 
f € H! (Q), u € HÈ(Q) is said to be a weak solution of 
(52), if it satisfies 


aglu, v) = £,(v) for all v € Hy (Q) (54) 
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Here, in the formulation, we have introduced the bilinear 
form 


ag(u, v) =f Vu - Vv dx 
Q 


while for fixed f € H7! (Q) by £; (v), we denote the anti- 
linear (i.e. conjugate-linear) functional £z: Hi (Q) >v e 
£,(v) ER defined by £;(v) := (f, v}. By definition, the 
weak solution of (52) is a function from HA (Q), which 
thus does not need to have derivatives of the second order 
in Q (it has only generalized derivatives of the first order, in 
general). The concept of weak solutions of the BVP (52) is 
thus substantially more general than the concept of classical 
solutions of the problem. 

It is well known that (54) has a unique solution. In fact, 
the existence and uniqueness of the solution of (54) can be 
established by the celebrated Lax—Milgram Lemma (see 
Lax and Milgram, 1954). In the present example, since the 
bilinear form a(., -) is symmetric, that is, a(u, v) = a(v, u) 
for all u,v € HQ), it can be shown that the solution u of 
(54) is the only element in Ht (Q) that minimizes in the real 
Hilbert space Ai (Q) the following quadratic functional: 


J(v) := fagl, v) — £0) (55) 


This is the well-known Dirichlet principle, and the quadratic 
functional in (55) is called the energy functional, Con- 
sequently, the solution space H (Q) is often referred to 
as the energy space of the Dirichlet problem for equation 
(52). Dirichlet’s principle provides an alternative means of 
approximating the solution of (54) by minimizing J(-) in 
finite element subspaces of H}(Q). This is known as the 
Rayleigh—Ritz method and leads to the same linear system 
of algebraic equations as obtained by the Galerkin method 
in the present situation. Although both methods lead to the 
same results, the basic ideas of these methods are different. 

In general, the reinterpretation of the Dirichlet problem 
(52) in the context of distribution theory, here in the form 
of (54), is often referred to as the weak formulation of the 
problem, whereas a minimization problem in the form of 
(55) is referred to as the variational formulation of (52), and 
the equation (54) is referred to as the variational equation. 
However, these terminologies are not universal. Throughout 
the sequel, these terms will be used interchangeably without 
distinction. 

For the BIEs, without loss of generality, we will first 
modify the problem (52) slightly by considering the homo- 
geneous equation but nonhomogeneous Dirichlet boundary 
condition, namely, 


Au=0 inQcR 
up = onl (56) 


where ọ = |, for some give function @ € H '(Q). Here, 
ọ is the given Cauchy datum and the missing one is the 
normal derivative of u, that is, 


ə 
o:= Vu-n|p = =u 


an 


in terms of our terminology in Section 2. Again, one may 
show that there is a unique weak solution u, which is now 
in H}(Q) such that u — @ € H} (Q) and 


ago(u, v) =0 forall ve Ai(Q) 


To discuss the weak solution of the BIEs, we again intro- 
duce the simple- and double-layer potentials 


V(x) = Í E(x, y)MY) ds, 
and 


a 
Wu(x) = f gE, yMO) ds, for xeQ (57) 
r 


for density functions à and p, where E is the fundamental 
solution for the Laplacian in R? given by 


1 1 
E(x, y) := ie aa 


(see e.g. equation 5). In the indirect approach based on 
the layer-ansatz, for smooth density functions and u, we 
may seek a solution of (56) in the form of a simple-layer 
potential u(x) = V(x), or in the form of a double-layer 
potential u(x) = —W p (x) for x € Q. The former then leads 
to an integral equation of the first kind, 


V= onr (58) 


while the latter leads to an integral equation of the second 
kind 


Gi-K)p=o on r (59) 


Here, V is the simple-layer boundary integral operator on 
the boundary 


V(x) = [ee y)rQy) ds, for xer (60) 


where we have kept the same notation V for the boundary 
integral operator, while the boundary integral operator K, 
defined by 


a 
K(x) =f g E& vn) ds, for xer 
ry Oy 
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is now the boundary double-layer potential operator. The 
operator V has a weakly singular kernel, But in contrast to 
the situation in elasticity, for the Laplacian both K and its 
transposed 


K'd(x) = I Ge Why) ds, for xer 
P) On, 


also have weakly singular kernels, provided F is smooth 
enough (e.g. I e C?); then these operators are also well 
defined on the classical Hélder spaces. In order to define 
weak solutions of the BIEs (58) and (59) in the same 
manner as for the PDEs, and to be able to extend our 
approach to more general I as to Lipschitz boundaries, we 
need the density functions in appropriate Sobolev spaces. In 
other words, we need the right boundary energy spaces for 
the BIEs. These should be in some sense closely related to 
the ‘traces’ of the energy spaces from the BVPs with partial 


- differential equations. Indeed, this can be seen as follows. 


First, observe that for a given smooth density function 
(say, in C*(I)), the potential 


u(x) = V(x) := fremo ds, for xeR\rT 


(61) 
satisfies Au = 0 in Q as well as in Q°. Moreover, we have 
the jump relation 


2 + 
gene wi [=] (62) 
r 


where vt denotes the limits of the function v on T from 
Q° and Q respectively. From Green’s formula, we obtain 
the relation 


ou 
f AO) VAO) ds = f [=] uly ds = f [Vu (x)| dx 
r r Lên jr QUR 
(63) 
On the other hand, for a given smooth moment function 
H (say, in C1+*(T)), set 


a 
u(x) = -wu =- f gr EMO) A, 
for xeR\T (64) 


Similarly, u satisfies Au = 0 in Q as well as in Q° and the 
corresponding jump relation reads 


An application of the Green formula then yields the 
relation 


[oroi k)wonas= f EE uas 
= f Vua 68) 


where we have denoted by D the hypersingular boundary 
integral operator for (56) defined by 


Du(x) := -f ee sity) ds, for xer 
an, Jr ðn, 


In view of the identities (63) and (65), it is not difficult 
to see how the exact weak formulations of the BIEs of 
the first kind (58) and of the second kind (59) should be 
formulated in the Sobolev spaces. Moreover, these relations 
also suggest appropriate energy boundary spaces for the 
corresponding bilinear forms for the boundary integral 
operators under consideration. To make all this precise, we 
need the concept of the boundary value of the function 
u € HQ) on r. 

Let L? (T) be the space of square integrable functions 
on I’, equipped with the norm ` 


(1/2) 
ole) = ( Í lpo)? as,) 


Then, one can show that there is a constant c such that 
Youll reer < cellula for all u € ci) 


where you = u]p denotes the boundary value of u on F. 
By continuity, the mapping Yọ defined on C}() can be 
extended to a mapping, still called yg, from: H! (Q) into 
L(P), By the extension, you is called the boundary value 
of u on T. It can be shown that 


Ker(yo) := [u € H!(Q)|you = 0} = Hi (Q) 


and the range of Yọ is a proper and dense subspace of LT), 
called H!/2 (T). For ọ e H! (T), we define the norm 


lel azm = ae Illy 
yoore 
and H'/?() is a Hilbert space. We denote by H7¥/7(r) 
the corresponding dual space of H'/?(I), equipped with 
the norm 


10h, HDI 
ou ee = su TS aaa 
Dano = SP Tiran 
p#0 
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where (-, -) now denotes the L?(T)-duality pairing between 
the spaces H~/2(I) and H'/?(P) on I. For à € L7(P), 
the duality (A, u) is simply identified with (^, 11) (ry. 
We remark that the Cauchy data u|p and du/dn of 
the weak solution u € H!(Q) of (56) are in H/2(I) and 
its dual H~'/2(P) respectively, as will be seen from the 
generalized first Green theorem to be stated later. We may 
define from (63) and (65) the boundary sesquilinear forms 


4,0) = (Vd) foral yae nH?) (66) 


a, (v, p) = (Dv, (47 = K)u) for all v, p € H? (T) 
(67) 


Note that both the sesquilinear forms ajp and ap are 
symmetric (or Hermitian) since V and D are both self- 
adjoint and, moreover, DK = K'D because of the Calderon 
projection property. Then the weak formulation of the BIE 
of the first kind (58) reads as follows: 


Definition 1. Given ọ € H'/?(L), find a function N€ 
H-"2(L) such that 


aX A = f(x) forallye HT) (68) 
where £, is the linear functional on H -1/2(T), defined by 
LOO = (% 9) for x € HPT). 


For the BIE of the second kind (59), we have the 
following weak formulation: 


Definition 2. Given ọ € HY? (T), find a function p € 
H'/2(P) such that 


alu u) = lpo) forallue HVT) (69) 


where the linear functional is defined by £ pel) = (Dv, p) 
for all v € HV? (P). 


Also, let us briefly consider the interior Neumann 
problem, 


Au=0 ingQcR and Žur =o on? 
with (¢, 1) =0 (70) 
where o = (9/dn)ii|p for some given Ñ € H}(Q). If we 
seek the solution 


u=—-Wp inQ or 2° (71) 


in the form of a double-layer potential, then, for its jump 
across I’, 


u =u” —ut = [uly (72) 


we arrive at the hypersingular integral equation of the first 
kind, 


Du=0 ot (73) 


Consequently, with u given by (71), we have from (72) the 
relation 


[udwas = f |Vu(x)|? dx 
Tt RUR: 


If the solution is sought in the form of a simple-layer 
potential 


u=Vr ng (74) 
then this leads to the integral equation of the second kind 
(Gi+kK')x=o onr (75) 


Here, with u given by (74), we find the relation 
f Vay) (LE + K'YAG) ds = f BE ym ads 
E r 
= f vuota a 
2 


Now the respective weak formulations corresponding to 
(73) and (75) read as follows: 


Definition 3. Given o € HYP (T) = {x € HOP)| 
(x 1) = 0), find y € H/P (T) such that 


a (v, u) t= (v, Du) = (v, 0) forall v €e HYPYT) 
. 77) 
Corresponding to (75), the weak form can be formulated as 
follows: 


Definition 4. Find à € HOYÐ(T) such that 


4.) = (VX, (G1 + Kd) = (Vx o) 
for all y e HYP T) (78) 


Again, both the sesquilinear forms a, and a4, are symmet- 
ric (or Hermitian). 

In view of these definitions, we see that the boundary 
energy spaces are, respectively, H~/?(P) and H/?(P) for 
the above four variational formulations. Indeed, this is true 
and will be justified in the remaining section; moreover, 
we present the basic results concerning the existence and 
uniqueness of the weak solutions of the corresponding 
BIEs. It is not difficult to see that for the treatment of the 
elasticity problems, one may follow a similar approach as 
described here for the Laplacian. 


Bounda 
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3.2 Basic results 


In the weak formulations of BIEs for the model problem 
(56) given by (68) and (69), we have tacitly assumed the 
continuity properties of the boundary integral operators 
involved. In particular, we need that the following operators 
(corresponding to (16)—(19) in elasticity) are continuous: 
V: Hr) > HT) 
D: HPP) > HT) 
4 — K: HPT) > HT) 
H + K': HYT) > HYT) 
These properties can be established by the following 


theorem. 
Theorem 2. Let V and W be the simple- and double-layer 
potentials defined by (61) and (64) respectively. Then, the 
following operators are continuous: 
V: HOYI T) > H! (Q, A) 9 HEQ, A) 
YoV and Yo V: HCYD(T) > HO (py 
tV and 1,V: HYD (T) > HOP T) 
W: HUT) > HHQ, A) @ H, A) 
YoW and ya W: HPT) > HYT) 
tW and 1,W: HYT) > HOYI T) 


Some explanations for the notations are needed; the func- 
tion spaces are defined as follows: 


H' (Q, A) := {u € H! (Q)|Au € AQ) 


equipped with the graph norm 


2 


2 ” 2 
lullig, a = lullo + lAul E-io 


and the local function space 
Hie(2°, A) = {u € HL (Q Au € z2) 


We recall that u € H} (9°) (and, respectively, Au € 
Hz: (Q°)) if and only if gu € H'(Q°) (and þAu € 
H™'(Q°)) for every  € CPR"), where C2(IR") is the 
Space of all infinitely differentiable functions that have 
compact support in R”. The operators yg and Yeo are the so- 
called trace operators and have been introduced previously; 
You = ulp =: u` for u € C°(R), while ygu = ulp =: ut 
for u € C9(Q*), For a smooth function, tu and Tu coin- 
cide with the normal derivatives (8/dn)u~ and (@/dn)ut 


respectively. These are linear mappings whose extensions 
will be given more precisely from the generalized first 
Green formula later. In terms of the trace operator Yo and 
the linear mapping t, we have the relations 


YY =V, 
-yW =} -K, 


-tW = D 
tV = 374K’ 


and Theorem 2 provides the continuity of these boundary 
integral operators. The proof of Theorem 2 will be deferred 
to the end of this section after we have introduced more 
machinery, 


As a consequence of the mapping properties, we have 
the following jump relations. 


Lemma 2. Given (0,9) e HO (T) x HUT), then 
the following jump relations hold: 


[yoVo],, =9, [tVo]], =o 
lyoWel, = 9 [tW], =0 


Again, the proof will be given later. With respect to the 
existence of solutions of the variational equations (68) and 
(69), we have the following results: 


Theorem 3. (a) The bilinear form a, defined by (68) is 
HYT )-elliptic, that is, there exists a constant a, >0 
such that the inequality 


aX% Y= a Never) (79) 


holds for all x € H~/2(L). (For n = 2, the two-dimen- 
sional, problems, one needs for (79) appropriate scaling of 
R*.) 

(b) The bilinear forms A | = 2,3, 4, defined by (69), (77), 
and (78) satisfy Garding inequalities of the form 


Re{a,,(u, v) + (Cju, v)} = a lolin 
for j =2,3 andallv € H'?(r) 
Refay.(d, A) + (Cad, } = ag IAr 

forall) e HVA T) 


where 


Cy C3: HT) > HYT) 
and 
Ca: HPT) > H?) 


are compact linear operators, a; > 0 are constants and 
(s+) denotes the duality pairing between H-"?{T) and 
HPP). 
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In addition, the bilinear forms a, x and az, are BË 2p- 
elliptic, that is, 


a; (W u) > a; lelin (80) 


for all we HPT) := {v € H'O) | (v, 1) =0) and 
j =2,3; and a, is Hy "° (T) elliptic, that is, 


ag s N > oal for all. e Hg O) 


Compact operators play an important role in our analysis. 
We include here the basic definition of a linear compact 
operator. 


Definition 5. A linear operator C between two Hilbert 
spaces X and Y is compact (or completely continuous) iff 
the image CB of any bounded set BC X is a relatively 
compact subset of Y (i.e. the closure CB is compact in Y). 


The well-known Lax--Milgram theorem provides a con- 
structive existence proof for any bilinear form, which is 
elliptic, as ap- In view of Theorem 3{a), we conclude 
that the variational equation (68) has a unique solution 
à € H~'/2(P). On the other hand, for bilinear forms such 
as aj» j = 2,3,4, which satisfy Garding inequalities, the 
Fredholin theorem then gives the necessary and sufficient 
conditions for the solvability of the corresponding varia- 
tional equations. It is not difficult to see that the homoge- 
neous equations of (69) and (77) have nontrivial solutions, 
u = constants, although the corresponding homogeneous 
equation (59) has only the trivial solution. As in Section 
2.3, we may augment (69) and (77) by adding a normaliza- 
tion condition and consider a modified system such as 


a,.(v, u) + olv, 1) = £(v) forall v € H'?(P) 
and 
(lu)=0, j=20r3 (81) 


where œw € R is now viewed as an unknown constant. In 
FEM, this is a familiar, so-called mixed formulation; see 
Hsiao (2000) and Steinbach (2002). 
Other equivalent formulations with stabilized symmetric 
coercive variational sesquilinear forms 
aj (V, p) + (v, 1)(1, u) = £(v) 
for all v e H1? (T), where j=2or3 (82) 
a4 (X A) E (X 1) (1, à) = (X, Vo) 
for all y € H7 CT) (83) 


(see Fischer et al. (1985) and Kuhn and Steinbach (2002)). 
We note that the variational equation of (69) and those 


of (81) and (82) are equivalent. We now summarize our 
results. 


Theorem 4. (a) Equations (68) and (83) as well as 
equation (82) for j = 2 and 3 have unique solutions » € 
HT), wp € H'(L), respectively. (b) The systems (81) 
for j = 2,3 have unique solution pairs (p, œ) € HY? (T) x 
R. Consequently, (59) admits a unique solution of the form 


B= poe 


where pọ is the unique solution of (81) or (82) for j = 2, 
respectively, and c is the constant defined by 


1 
= ——~ | (p+ Kuo) ds with mT) = f ds 
mT) f 9+ Kug ( " 
We remark that the constant in the theorem plays the same 
role as an integration constant in the Fichera method (see 
e.g. Hsiao and MacCamy, 1973), since one may rewrite the 
variational equation of (81) in the form 


(D(4T — K)p, v) + @(1, v} = (De, v) 
forall ve HT) (84) 


and both ọ and ọ + c lead to the same right-hand side of 
(69), (81), and (82). Similarly, for the Neumann problem, 
the augmented variational system corresponding to (77), 


a (v, u) + olu, 1) = (v,6) forall v € HPT) 
together with (1, p} = 0 (85) 


or the one corresponding to (78), 


ag (X A) + olx, 1) = (X, Vo) forall x €e HPT) 
with (1, à) = 0 (86) 


have unique solution pairs (p, œ) ¢ HUT) x R and 
(h, œ) € HOMI (LP) x R, correspondingly; and the associ- 
ated stabilized symmetric variational equations have unique 
solutions as well. These will be particular solutions of (73) 
or (75) respectively. 

The proof of Theorem 3 will be given after we collect 
some relevant mathematical tools. The proof of Theorem 4 
is a consequence of Theorem 3 together with Theorem 6 
and Theorem 7 below. 

Since V is HO? (T)-elliptic, we may introduce the 
corresponding energy norm on this space defined by 


Alley = (Va, x9) 
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which is equivalent to the H (-1/2)(T)-norm, and its L,-dual 
norm defined by 


ie I, p) 
lul gp := S 
o¢rer ary | Ilev 


Then one finds that ||w|lep = (V~'w, 2)” is equivalent 
to the H@/2(r)-norm of p. The following theorem was 
shown by Steinbach and Wendland (2001) for Lipschitz 
boundary and also for the equations of elasticity in 
Section 2. 


Theorem 5. The operators (1/2)I + K and (1/2)I + K' 
are contractions in the energy spaces and 
|G +K)u| < cxlulev 
HOA (Pry 
for all u € | Ho? T) 
arte, <exlMev 


HOP) 
forall ne { HEAT) 
where Cy = (1/2)(1 + y1 — 40103) < 1 and where the re- 
spective upper cases correspond to the + signs and the 
lower ones to the — signs. 


This theorem implies that the classical integral equations 
of the second kind, (33) and (37), can be solved by 


_ employing Carl Neumann’s classical iterations: 


of) = (4I + Ko 4 Do in HOYT) 
and 
eo) = (41 — K)g® +Vo in HEP T) 


Which converge with respect to the corresponding energy 
norms ||- ||pp and |{ - ily respectively. 


3.3 Mathematical ingredients 


In order to establish the theorems stated above, we need 
some mathematical tools. In this subsection, we intend 
to collect the most basic results that we need. For ease 
of reading, we try to keep the notation simple and the 
presentation straightforward, although some of the material 
included may not be in the most general form. 

First, we observe that the variational formulations of 
BVPs for PDEs as well as for BIEs all lead to an abstract 
variational problem in a Hilbert space H of the form 


Find an element u € H such that 
a(v,u) = f(v) foralveH (87) 


Here, a(v, u) is a continuous sesquilinear form on H and 
£(v) is a given continuous linear functional on H. In order 
to obtain existence results, one needs to show that a(-, +) 
satisfies a Garding inequality in the form 


Refa(v, v) + (Cv, v)u} = allvll}, (88) 


for all v € H, where œg > O is a constant and C: H > H 
is a compact linear operator. In the most ideal case, when 
the compact operator vanishes, C = 0, the sesquilinear form 
a(-, +) is said to be H—elliptic. In this case, we then have the 
celebrated Lax—Milgram lemma available for the existence 
proof of the solution, although in most of the cases C # 0. 
However, Garding’s inequality implies the validity of the 
Fredholm alternative, which means that uniqueness implies 
existence. Since these results are so fundamental, we will 
state them for the convenience of the reader. We begin with 
the definition of a sesquilinear form. 


Definition 6. A map a(.,-): Hx H — C is called a ses- 
quilinear form if it is linear in the first variable and antilin- 
ear (i.e., conjugate-linear) in the second. The sesquilinear 
form a(.,-) is said to be continuous if it satisfies the in- 
equality 


lalu, v)| < Mijullyllully for allu,v EH 


We now state the Lax—Milgram Lemma (see Lax and 
Milgram, 1954), which, for symmetric a(-,-), is a slight 
generalization of the Riesz representation theorem using 
sesquilinear forms instead of the scalar product. 


Theorem 6 (The Lax-Milgram Lemma) Let a(.,-) be 
a continuous H-elliptic sesquilinear form. Then, to every 
bounded linear functional £(-) on H, there exists a unique 
solution u € H of the variational equation (87). 


In the context of variational problems, if the sesquilin- 
ear form a(-,-) satisfies the Garding inequality (88), then 
from the Riesz—Schauder Theorem, one may establish 
the Fredholm alternative for the variational equation (87), 
which generalizes Theorem 6 (the Lax—Milgram Lemma). 
In order to state the result, we need the formulations of 
the homogeneous variational problem and its correspond- 
ing adjoint problem for the sesquilinear form a(-, -) on H. 
The homogeneous variational problem is to find a solution 
ug € H such that 


alv, up) =0 forall veH (89) 
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whereas the corresponding adjoint homogeneous varia- 
tional problem is to find a solution vo € H satisfying 


a*(u, Up) = 4%, 4) =O forall u EH (90) 


We also need the adjoint nonhomogeneous problem to (87): 
Find v € H such that 


a*(u,v) =alv,4) =£"(u) forall weH (91) 


Theorem 7 (Fredholm’s alternative) For the variational 
problem (87), if the sesquilinear form af., +) is continuous 
and satisfies the Garding inequality (88), then there holds 
the alternative: either (87) has exactly one solution u € H 
for every given continuous linear functional £ on H or 
the homogeneous variational problems, (89) and (90), have 
finite-dimensional eigenspaces of the same dimension. In the 
latter case, the nonhomogeneous variational problem (87) 
and its adjoint problem (91) have solutions iff the following 
orthogonal conditions 


(vo) = 0, and £* (uy) = 0 


hold for all the eigensolutions, vo of (90) and uy of (89) 
respectively. 


For a proof of Theorem 7, see Bers et al. (1964) and 
Hildebrandt and Wienholtz (1964). 


Lemma 3. Let the sesquilinear form a(.,-) satisfy the 
Gårding inequality (88) on H, and, in addition, 


Re a(v,v) > 0 forallveH\ {0} 


Then, a(.,-) is H-elliptic. 


A contradictory proof of the Jemma follows from stan- 
dard functional analytic arguments. 

From our model problems (56) and (70), we see that 
there is an intimate relation between the sesquilinear forms 
of the BIEs and that of the underlying PDEs through 
Green’s formula. Indeed, in particular, we see that the 
linear mappings t and t, employed in Theorem 2 will be 
defined from the following so-called generalized first Green 
formulas. 


Lemma 4 (Generalized First Green’s Formula) For 
fixed u € H'(Q, A), the mapping 


v > (tu, v)p = ag(u, Zv) + [ (Au) Zo dx 
Q 


is a continuous antilinear functional tu on v € HP (T) 
that coincides for u € H?(Q) with (8u/ðn), that is, w= 


(ðu/ðn). The linear mapping T : HHQ, A) > HOYT) 
with u > tu is continuous. Here, Z is a right inverse to the 
trace operator Yo. Thus, there holds the generalized first 
Green’s formula 


—(Au, v)g = — Í (Au)T dx = dg(u, v) — (tu, YoX)p 
2 
(92) 
for u € HQ, A) and ve H'(Q). Here, aglu, v) := 
fg Vu-Vuds isa sesquilinear form and (-,-) denotes the 
duality pairings such that 


(tu, YoU) p = (tu, YoY) 2207) = forms 


provided tu € HOY? (T) and you € HC (D). 


Lemma 5 (Exterior Generalized First Green’s For- 
mula) Letu € HL (QS, A) be given. Then, the linear func- 
tional t,u defined by the mapping 


(Toti, Yoo p = —Age( Z,A) — [ AuZ,d dx 
ee 
for all. €e HYP T) i 


belongs to HCY®P (T) and coincides with (3u/ðn), provided 
u e C2(Q). Moreover, t, | u > Tu is a linear continuous 
mapping from HL, (QS, A) into HOY® (T), and there holds 
the exterior generalized first Green’s formula 


-f (Au)T dx = age (u, v) + (Tott, YooY)r ` (93) 
ae 


for u e HLS, A) and for every v € Him. 


In Lemma 5, the function space Hj.,,,,(2*) is defined by 


Hlm (2) 


= {ve Ah (2°) |v has compact support in RB} 


The operator Z, denotes the right inverse to Yep» which 
maps ve HUT) into Zv e Him with the 
supp(Z,v) contained in some fixed compact set Rr C R? 
containing I’. The sesquilinear form 4g: (u, v) and the dual- 
ity paring (., -)p are defined similarly as those in Lemma 4. 

Generalized first Green’s formulas give the relations 
between the solution spaces of weak solutions of the PDEs 
and the corresponding boundary integral equations. We also 
need the generalized formula for deriving the representation 
formula for the weak solutions. This is based on the 
generalized second Green formula. 
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Lemma 6 (Generalized Second Green’s Formula) For 
every pair u,v € H! (Q, A), there holds the formula 


fua — Aud} dx = ((tu), Yov) p — ((You), tv)r 


For the proofs of generalized Green formulas and for the 
systematic study of Sobolev spaces, the reader is referred 
to Nečas (1967) or Lions and Magenes (1972), and also 
Aubin (1972). 


3.4 Proofs of basic results 


For the proofs of Theorem 2 and Lemma 2, we need a 
representation formula for the variational solution of the 
transmission problem for 


—-Au=0 in RYE (94) 
satisfying the decay condition u = O(|x|~!) as |x| —> oo. 


Theorem 8 (Generalized Representation Formula) Ler 
u be a variational solution for the transmission problem (94) 
with ulo € H! (Q, A) and ulg € HEL, A). Then, u(x) 
admits the representation 


u(x) = ([tu]p, E(x, )}r — (S26. -), voude) 
r 
forxe RAT , (95) 


where [You]r and [tu], denote the jumps of you and tu 
across T respectively. 


For problems in the plane, n = 2, this representation 
formula is to be modified slightly because of the logarithmic 
behavior of the simple-layer potential. 


Proof. For the proof, we shall use the following estimate: 
Irur, BG, et + |(2 ee, >. trout) | 
ôn in 


SR lutaram + lulmea] (96) 


whore R > 0 is sufficiently large so that QR = R° [){y € 
R*Ily| < R} contains &. Since E(x, +) for fixed x ¢T is 
a smooth function on the boundary I, the estimate (96) 
follows from Lemmas 4 and 5 together with the continuity 
of the duality (-,-)o. 

With the estimate (96), the representation (95) can be 
established by the usual completing procedure. More pre- 
cisely, if u is a variational solution with the required prop- 
erties, we can approximate u by a sequence of functions u, 


with 
Ula €C°(Q) and ulo € C&O) 


comp 


satisfying the classical representation formula, 


5 du, a 
u) = f E(x, y) e], ds, -f ang DUu ds, 
forx €R?\P (97) 


so that 


lu — ullae, a + lu — urlia casa) — Oas k —> œ 


Then, because of (96), for any x g T, the two boundary 
potentials in (97) generated by u, will converge to the 


corresponding boundary potentials in (95). This completes 
the proof. oO 


Observe that we may rewrite the representation (95) in 
the form 


u(x) = V[tulr (x) — Whypulp(x) for xe R?\P 

l (98) 
where Vo(x) is the simple-layer potential and W(x) is 
the double-layer potential, namely, 


Volx) = (E(x, -), o)p 
a 
Weed (Sze, >, o), (99) 


From this formula (98), we may now establish the mapping 
properties in Theorem 2. 


Proof of Theorem 2 For the continuity of V, we consider 
the transmission problem 
; 1 
Find u € H (Q, A) and u € HL (Q5, A) satisfying the 
differential equation 


—Au = 0 in Qand QS 
together with the transmission conditions 
[You]lr = Oand [tulp =o 


with given o e HOYÐ (F) and the decay condition u = 
O(\x|74) as |x| > œ. 


This variational transmission problem has a unique solu- 
tion u. Moreover, u depends on o continuously. The lat- 
ter implies that the mappings o +> u from HO '/2)(r) to 
H}(Q, A) and to HL, (2°, A), respectively, are continuous. 
By the representation (99), it follows that o > u = Vo is 
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continuous in the corresponding spaces. This, together with 
the continuity of the trace operators 


Yo: HQ, A) > HYT) 
and 
Yoo: Hoel, A) > HPT) 


and of the linear mappings (from Lemmas 4 and 5) 


t: HŒ, A) > HPT) 
and 
t: Hi (2, A) > HPC) 


implies the continuity properties of the operators associated 
with V. 
Similarly, the solution of the transmission problem 
Find u € H'(Q, A) and u € H}, (8&8, A) satisfying 


—Au = Qin Qand in QF 
with the transmission conditions 
[you]lr = ọ EHP) and [tu], =0 
together with the representation (99), 
u(x) = —-Wọ@&x) for x € RÊ NA 


provides the desired continuity properties for the mappings 
associated with W. 


Proof of Lemma 2 We see from the representation (98) 
that u(x) = Vo(x) for x € R° \T is the solution of the 
transmission problem for (94) with the transmission condi- 
tions 


[yo#]p = O and o = [tu]p 


Inserting u = Vo gives the jump relations involving V. 
Likewise, u(x) = —W (x) is the solution of the trans- 
mission problem of (94) satisfying 


[You]r = gand [tu], =0 


which gives the desired jump relations involving W. This 
completes the proof of Lemma 2. 


Remark 1. In accordance with the classical formulations 
in the Hilder spaces, we now introduce the boundary 
integral operators on T for given (0, 9) € H~/?(r) x 
H? (T) defined by 


Vo:=yYVo, Ko:=yWetho 
K’o:=wWo-ic, De:=—wWo 


Clearly, Lemma 4 provides us with the continuity of these 
boundary integral operators. The corresponding Calderon 
projectors can now be defined in a weak sense by 


4I-K V 
Ca = 
o D Ly 4K! 
2 


Coe = I — Co 


and 


which are continuous mappings on (H2 (r) x HOY 
o). 

To establish Theorem 3, we need Lemmas 4, 6, and 2 
together with the Garding inequality for the sesquilinear 
form for the underlying partial differential operator. For the 
model problem with homogeneous Dirichlet conditions, we 
have 


ag(u, v) = f vu¥oas forall u,v € H1(Q) (100) 


and there holds a Garding inequality in the form 
Re{ag(v, »)) = lvlino — lelo (101) 


for all v € H'(Q). The latter implies that there is a compact 
linear operator C defined by 


(Cu, Vao =f ud dx 
Q 
so that (101) can be rewritten as 
Ref{ag(v, v) + (Cv, vao?) kd aoliv lio (102) 


with some constant a) > 0. The compactness of C can be 
seen as follows. Since (Cu, vyo = (u, C*v) mop and 
from the estimate 


|lu, C*v)mol = |u, virol < lulan lelro 


it follows that C* maps L?(Q) into H!(Q) continu- 
ously. Then the Sobolev imbedding theorem implies that 
C*: H1(Q) — H? (Q) is compact and hence C is compact 
(see Adams, 1975). 


Proof of Theorem 3 The proof follows Hsiao and Wend- 
land (1977) and Costabel and Wendland (1986). For the 
sesquilinear form a,_(-,-), we first show that it satisfies a 
Garding inequality of the form 


Refa, (0,0) + (0, Cyo)p} = ulloll- 


forall o € HOT) (103) 


where Cy : H7/?(P) > H'2(T) is compact. Then the 
H-"2(T) — ellipticity of aj,(-,:) follows immediately 
from Lemma 3, since 


Re a, (0,0) = Re (o, Vo) >0 for all o € HT) 
and 
Re(o, Vo) =O implies o = 0 


as will be shown below. For any o € H~'/2(P), let 
v(x) = f E(x, y)o(y) ds, forx € R\r 
T 


Then, v € H1(Q, A) and v € H}, (Q°, A) respectively. 
Moreover, Lemma 2 yields 


[yotlp =O and [tv] =0 


By adding the generalized Green formulas (92) and (93), 
we obtain 


(0, Vo) =f jor ax + f [Vo]? dx 
Q Qe 
= ag (V, v) + ag (v, v) (104) 


The right-hand side of (104) -contains two sesquilinear 
forms for the corresponding partial differential operator 
—A, one in Q and the other one in the exterior domain Q°. 
It is this relation, in some sense, that connects intimately 
the Garding inequality of V with that of —A. We see from 


~ (100) that 


2 
ag(v, v) = llr cay - (Cv, vao 


where C is a compact operator from H! (Q) into itself. This 
takes care of the contribution to the Garding inequality 
for V from the interior domain &. We would expect, of 
course, to have a similar result for the exterior domain 2°. 
However, there is a technical difficulty, since v defined by 
the simple-layer potential is not in L? (9°). In order to apply 
similar arguments to 2°, following Costabel and Wendland 
(1986), we introduce a fixed C9°(IR*) cut-off function 
b with ọlg=1 and dist({x € R°) # 1}, 2) =: dy > 
0, since the exterior generalized first Green formula in 
Lemma 6 is valid for v, := ov having compact support. 
With this modification, (104) becomes 


<- {o, Von + f (—Av,)v, dx 
Qe 


= f ivorars f Wv. ax = ag(v, V) + ag. (Ves U,) 
(105) 
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On the other hand, from Lemmas 4 and 6, the map- 
pings t:H'(Q,A)—> H1) and t, HE (Q, A) > 
H~*/?(P) are continuous. Thus, we have 

loli- 

= [ivrig < WH reg + 2e- 

< c {lO cqy + loela + lavea] 
where 9 := Q° N supp >. Then the Gårding’s inequality 
(102) for ag in the domain Q and for ag in Q’ implies 

aoflivlno + ivel} < Refaat, t) + age, va) 


+ (Cv, vyo + (Cva vm) 


with a positive constant œg and compact operators C and 
C' depending also on Q and Q’. Collecting the inequalities, 
we get 
ay Holl, -r1aqy < Relag(v, V) + ag We ve) 
+e (Cv, Vg + elC Ves Ye) pan} + esl Avel F-a 
(106) 
Then, from (105) and (106), we finally obtain the inequality 


ello = Refa, (0,0) + (0, Cyo)} 


where the operator Cy: H~/(P) > H! (D) is defined by 
the sesquilinear form 
(x, Cyo)p = (x, 0) = c (V X, CVO) gia) 
+ €,(6Vo, C'OVo) pica + c3(A(oVo), A(@V0)) F-a 
+ (A (YX), OVO) pa 
We note that the operator Cy is well defined, since each 
term on the right-hand side is a bounded sesquilinear form 
on x,o € H712(T). Hence, by the Riesz representation 


theorem, there exists a linear mapping jCy: H7/?(r) > 
HZI) such that 


€(X,0) = (X, JCy 9) ying) 


Since j7}: HYT) > (HYT) = H'/?(P) is contin- 
uous, this representation can be written in the desired form 


(X Cyo)p = c(%, 0) for all x, o € HT) 


where Cy = j—'jCy. It remains to be shown that Cy is 
compact. This follows from the standard arguments based 
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on the Sobolev imbedding theorem and the compactness of 
the operators C and C’. We refer the reader to Hsiao and 


Wendland (2005) for details. 
To establish the Garding inequality for the sesquilinear 


form 
a, (v, w) := (Dv, (31 — K)p) 


we set for fixed p € H? (D), 


a 
w(x) = -f Bap EO MOD A forre R\P 


It follows from Theorem 2, Lemma 2, and the generalized 
first Green formula in Lemma 4 that we arrive at the 
integral relation 


(Du, ($1 — K)u)r = agw w) 2 0 (107) 
Thus, from (102), we have 
Re{(Dp., (31 — K)u)r + (Cw, wao} = op wilFr 6a) 
> a ltwlê-ng) = Asli De lley—vaqry (108) 


where as > 0, by using the continuity of the linear mapping 
in Lemma 4, Now let us assume that we have the Garding 
inequality for the operator D in the form 


Re{ (p, Du)r S (k, Cph)} = tp lIellee1acry (109) 


where Cp: H!2(T) > H77(P) is a compact operator. 
Then, we have the estimate 


IDe- = aple ~ ICpwll 12) 
This together with (108) implies that 
Re{(Du, (HI — Kulp tele, Wr} = alleles 
where 
clu, Wp = O5(Cpv, Cob) g-u) + (Wo, CW) aio 
the compactness of which follows from the arguments as 
before. 


To complete the proof, it remains to establish the Garding 
inequality (109). The proof is based on the estimate 


lelro = lyowir lpu2¢ry 
<c{llwilzngy + lOwVir can} 


which follows from the trace theorem. The remaining 
arguments of the proof are again the same as in the proof 


for the operator V in (105). In fact, (109) is Garding’s 
inequality for a3,. Because of (76), Garding’s inequality 
for a,, follows in the same way as that for az,- 

For the proof of (80) for j = 2, we employ Lemma 35 
since ay, (Io, o) = O with (107) implies wọ = constant in 
T and, with (6) for wọ and uniqueness for (59), one finds 
[ip = Wo on F and po = 0 if po € H’ (T). 

The proof of Theorem 5 rests on the Calderon projection 
properties and the relations between norms of dualities. For 
details, we refer the reader to (Steinbach and Wendland, 
2001). 


Remark 2. We notice that (104) implies the H712 (T)- 
ellipticity of V immediately if weighted Sobolev spaces 
were used as in the French school (see Nedelec and Plan- 
chard, 1973; LeRowx, 1977). However, we choose not to 
do so by introducing the cut-off function $. It is also worth 
mentioning that the Garding inequality for a, (-, +) is estab- 
lished without using compactness of the operator K. In 
particular, we see that in (108), we may obtain the estimate 


IIB, = chweg = atliar ~ Keli? 


by the trace theorem; if K is compact, the proof will be 
more direct and shorter without employing (109). However, 
we deliberately do so without using the above estimate since 
our approach here can also be applied to the elasticity as 
well as for a Lipschitz boundary, in which case, also for 
the Laplacian, K is no longer compact. 


4 THE GALERKIN-BEM 


This section is devoted to the Galerkin-BEM for the same 
model problems as in the previous section. From the for- 
mulation of the Galerkin system to the basic results of error 
estimates, it is a section that describes the Galerkin-BEM 
and provides detailed proofs of basic results for typical 
BIEs. Needless to say, the approach is general enough and 
can be adapted to other cases including elasticity with slight 
or even without any modifications. 


4.1 Galerkin equations and Céa’s lemma 


To simplify the presentation, we begin with the variational 


equation (68) for the BIE of the first kind (58). The bound- 


ary energy space is H~/?(). The Galerkin method con- 
sists of seeking an approximate solution of (68) in a finite- 
dimensional subspace Sp of the space H-™2 (T) consisting 
of admissible functions, rather than in the whole space. 
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More precisely, let S, be a family of finite-dimensional 
subspaces that approximate H-1/2 (T), that is, 


for every à € H7'/2(L), there exists a sequence 
Xa E Sp C HO) 
such that [Xa — Mla- > 0 ash—->O (110) 


that is, the degrees of freedom N — oo. Then the Galerkin 
approximation of the solution of (68) is a function 
An € Sp satisfying the Galerkin equations 


ACK M) = Lola) forall x, €S, C HT) 
(111) 
The Galerkin solution ^, is in some sense the discrete 
weak solution of (58). Alternatively, since S, is finite- 
dimensional, if {w;(x)}} is a basis of S,, then we may 
seek a solution ^, in the form 


N 
dy = Do yjpy@) 
j 


where the unknown coefficients are now required to satisfy 
the linear system 


N 

Xali BY; = Lol) i=1,...,N (112 
- ® ) 
J 


As a consequence of the H-"?(F)-ellipticity of aC, °), 
it is easy to see that the Galerkin equations ain or 
(112) are uniquely solvable, and the essential properties 
concerning the Galerkin approximation ^, can be stated in 
the following theorem without specifying the subspace S, 
in any particular form. 


Theorem 9. There exists an hg > 0 such that the corre- 
sponding Galerkin equations (111) (or 112) admit a unique 
solution h, € S, C H~/?(P) for every h < hg. Moreover, 
the Galerkin projections Gj, defined by 


Gy: HT) a Ni a, E€ Sp CHOP) 
are uniformly bounded, that is, 


il Gyll = sup 


a-is] 


lGry Nl g-o) <c (113) 


for all h < ho, where c = c(hy). Consequently, we have the 


estimate 


A= i p 
I = Mally- S (e) bs IA = Xala- > 0 


ash—>0 (114) 


The corresponding result also holds for the Galerkin scheme 
for a4. in (83) and (84). 


The estimate (114) is usually known as Céa’s Lemma in 
finite element analysis (see also Ciarlet, 1978). Moreover, 
(114) provides the basic inequality for obtaining conver- 
gence results in norms other than the energy norm for the 
integral operator V. As in the case of PDEs, this simple 
yet crucial estimate, (114), shows that the problem of esti- 
mating the error between the solution à and its Galerkin 
approximation à, is reduced to a problem in approximation 
theory. In particular, if we assume that the finite dimen- 
sional subspaces S, are regular boundary element spaces 
as introduced in Babuška and Aziz (1977), then one may 
obtain convergence results with respect to a whole range of 
Sobolev space norms, including superconvergence results, 
by using the Aubin—Nitsche lemma for BIEs as in Hsiao 
and Wendland (1981a,b). We will discuss the details in the 
next two subsections. 

To prove Theorem 9, we first notice that by the H—1/2(r)- 
ellipticity of V (79), we have 


gs Wry = Rey, Vay) = o My -uzr 
for all ^y € Sp (115) 


with some constant a} > 0, independent of h. This implies 
that the homogeneous equations of (111) have only the 
trivial solution in S, and, hence, (111) has a unique 
solution. Now we introduce a family of projections P,: 
HEDT) + S, c HOVI (T) that is uniformly bounded, 
that is, there exists a constant cp such that 


iP, <c, foralO<h<hy (116) 


with some fixed họ >0. If P, is chosen as the L?- 
orthogonal projection onto S,, as in Hsiao and Wendland 
(1977), then (116) is a requirement for the finite element 
spaces on F, as analyzed by Steinbach (2001), which is 
fulfilled if these provide approximation as well as inverse 
properties; see also Aubin (1972) and Babuška and Aziz 
(1977). Now the Galerkin equations (111) are equivalent to 
the operator equation 


PEVP àn = Pio = PVA (117) 


where Pf: H%/2)(P) — S, denotes the adjoint of P,. Var- 
ious realizations of projections P, and P} are analyzed by 
Steinbach (2002). 

We have shown that this equation has a unique solu- 
tion, from which one may define the Galerkin projection 
Gry: +> dy, such that 


dy = Gry = (PAV P MPEV)D (118) 
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Observe that for X} € Sps Xa = Ph Xas and hence G,x, = 
Xr» that is, Gays, = Dis Moreover, from (115), we see 
that 


Dala- l Vag = Ar VAI = Ar PEVE, N) 


= [O VANS Rally for all, E EYT) 
and the continuity of V and (79) imply that the estimates 


Palla = ay Gay Ml g- T $ ll y Ail gz T" 
a m) m 
<M INI g-o 


hold for all } € H~! (T), where M and œ are constants 
independent of h. Then it follows that the Galerkin pro- 
jection G, is uniformly bounded, that is, (113) holds with 
c := M/a,, independent of h and à for all h < hy. 

To complete the proof, it remains to establish the estimate 
(113). This is a consequence of (114) together with the 
definition (118). This can be seen as follows: 


IA — Mla- = IO = Xa) + Gry Qa — Nla- 
S +c)1X_ — Allg- for every x, € Sh 


from which (114) results. 

Because of the limitation of space, we omit the proof of 
the corresponding results for a,_. 

Now we consider the Galerkin approximation of the 
solution (u, œ) € H'/*(P) x R to the augmented system 
(85) by using a family of finite-dimensional subspaces 
2, C H"?(T) that approximate H1/?(P), that is, 


for every v € H"? (T), there exists a sequence 
up € E, C AVP) 
such that lu, — vlag +0 ash—>O 


Again, we introduce a family of projections Q,: H'/7(P) 
-> E, that satisfies the uniform boundedness condition 


IQI <c, forall O<h<hy (119) 


For the Galerkin method to (85) with given o€ 
HOM T) (hence w=0), we have two variational 
formulations, (81) and (82). In the latter case, we can 
use the full finite-dimensional space &,, and the Galerkin 
equations read 

Find w, € ©), satisfying 


Tap (Up: Lp) (= 43, (Ug, by) + (Vg, 1)(1, by) = (Vp, 0) 
for all v, € E, (120) 


Since @3,(v,») is H 1/2()-elliptic, we now obtain the 
results that are completely analogous to those in Theorem 
9, which will be summarized below. 

One may also incorporate the side condition (1, p) = 0 
into the family of finite-dimensional subspaces 


G) 


on = AV, € E; | (v4.1) = 0} C AQ?) 


and execute the Galerkin method for (120) or (127) on the 
subspace ©, o C H 1/2 (T). Again, we obtain Céa’s estimate 
since az, is Hg” (T')-elliptic. 

Now we consider a, and suppose for a moment that the 
composed operator DK is on F available in explicit form. 
Then the Galerkin equations for (69) read 

Find up € Ep, C Hy” (T) from 


>, (Ur Wa) = (Vp, (B - Ola) = (vz, (B — C)p) 
for all v, € Bo, (121) 


where p € H}? (T) is the unique solution of (69) in the 
subspace Hy! 2(P). Here, B is the Hy! ?(r)-elliptic operator 
part of the bilinear form (67) satisfying 


(Bv, v) > alivinna forall ve B”) with %2 >0 
122) 


and C: HPT) > Ay?) is compact. Hence, (121) can 
also be written as 


Q;BQ, {7 — (Q BQ Chu, = QB — Cp 
With the Galerkin projection G, := (QF BQ,) QO} B of 
B, which is uniformly bounded because of (122), and with 
the operators 

L:=I-B'C and L,:=1-—G,,BC 
we find the equation 
Latr = Gaply. 

Since lim,_,o l(Gas ~ Dolma =0 for every ve 
HL? (T) and since BIC : HY? T) > H72) is com- 
pact, we obtain 

RL, —Lil= IZ -G,,)B Cl] >0 asho 
with respect to the operator norm || - ||. 

Moreover, L7! = (B —C)7!B exists; hence, there are 


constants Ay > 0 and cy such that Lj, exists and 


oe | Sc forall 0 <k < hg 
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(see Proposition 1.8 and Theorem 1.11 in Anselone (1971) 
and Kantorovich and Akilov (1964)). Therefore, 


-1 : 
bey = La GagLh = Gago 


and the Galerkin projection G,,g_c¢) to B—C in (121) 
is uniformly bounded on Hy! T). Therefore, Céa’s esti- 
mate holds for (121) as well as for the stabilized version 
(82). 

In practice, however, the Galerkin weights 


Ay, (Wj, Wy) = (Duj, GI — K)us) 


belonging to a basis {wey can only be computed accu- 
rately enough if the outer numerical integrations are suffi- 
ciently accurate or if an intermediate space S, is used in 
combination with the duality as in Steinbach (2002). Here, 
we circumvent these difficulties by replacing (121) by the 
following system of Galerkin equations, which, in fact, cor- 
respond to some preconditioning. 
For given ọ € HPT), compute ©, € Eo, from 


az, (Vip Ph) = (Vy, De) for all w, € Zon (123) 

With ¢~,, obtained, compute p, € Sop from 

(vy, (41 — K)p1,) = (v,,0,) forall v, € E>, (124) 
For (123), we have a unique solution satisfying 

lorla < lolang < cllellgvcy 
where u € Hy! 2(r) is the unique solution to (69). In fact, 
equation (124) is equivalent to the integral equation of the 
second kind 
py = (H + K)un +o in lT) c RT) 

and Theorem 5 implies unique solvability. Moreover, we 
obtain for the Galerkin projection p, =: G,p associated 


with the system (123), (124) the uniform estimate 


Gy pbllytcry = |b, laio Ss Mallen 


= 


Palen $ eztela £ elella 
l—cry 
where c, is independent of A. Consequently, Céa’s estimate 
is also valid for the system (123), (124), where only the 
Galerkin weights of D and K are needed. 

In rather the same manner, the Galerkin methods can be 
employed and analyzed for the Fredholm integral equation 
of the second kind (75) and its variational version (78). 


For (81) respectively (85), with the additional unknown 
w € R, we now define the bilinear form 


a3, (Cv, 0); (m, @)) = ag, (v, p) + olv, 1) + (1, Wk 
(125) 
on the augmented space H := (HY?(T) x R), where a3, 
satisfies Garding’s inequality 


a} (C, K); (v, K)} + e((v, K); (v, 1) 
> A(ullzueqy + IKP) on H=(80T) xR) (126) 


where c((v, K); (v, K)) = 2kw + 2{v, 1)(1, w) is a compact 
bilinear form. Then the Galerkin equations for a3, corre- 
sponding to (85) now read 


Find (up, ©,) € Hy, = (E, x R) satisfying 


43, (Or 1); Ha» @,)) 
= 03, (Up Wa) + Op (Vp 1) + (1, By) = (vw 0) (127) 


for all (vp, K) € Hp = (Ep x R). 

Since the solution to (85) is unique in H, from Garding’s 
inequality (126), it follows in the same manner as before for 
a, that the Galerkin projection G,: H > H, is uniformly 
bounded and Céa’s estimate is again valid. 

We summarize all these results in the following theorem. 


Theorem 10. Letu, € E, be the unique solution of (120) 
OF up € Bop the unique solution of (121) or of (123), (124), 
and let (pp ®,) € Hp = (E, X R) be the unique solution 
pair of (127). Then, we have Céa’s estimates 


5 <¢ inf u~ v 
lu — wilang S ne lu — Valar) 
and 


~ < inf K) — (p, @ 
HG, ©) — (Mp li S Cy EHA Ic, )— (p liy 


= Cj ge Up, — Bll 
with some constants ©, c}, independent of Sp, Son or Hy, 
respectively. 
Clearly, the corresponding Galerkin projections are uni- 
formly bounded: 
Gp: 70) —> 2, defined by (120) satisfies 
IGrpll < c 
Gpp: Ha? (E) > Bo, defined by (121), or by 
(123), (124), satisfies NGrpl < € 
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and 


GH —> Hy, defined by (127) satisfies 
IS, Se 


4.2 Optimal order of convergence 


The estimates in Theorems 9 and 10 are basic abstract 
error estimates that provide sufficient conditions for the 
convergence of Galerkin solutions, if the family H, of 
subspaces of the energy space H have the approximation 
property (ap) 

lim hee lUr Ka) ~ Ws Oliy = 0 
However, in order to know the precise order of conver- 
gence, one needs more specific properties from the approx- 
imation theory concerning the approximate subspaces and 
regularity results for the exact solutions to the BIEs under 
consideration. 

We first need the notion of the higher-order Sobolev 
spaces other than those introduced in the beginning of this 


section. 


(1) Sobolev space H™(Q),m € No. We start with the 
function space 


C™(Q) = {u € C” (9) | [lull ymca < OOF 


with 


1/2 
Hell zac) z >. Í pupas] 
alsm 


alel 
Dt = aa 
DXT i eey 0x" 


where a € Nọ and |a| =o) +++: +0. Then, H” (2), the 
Sobolev space of order m, is defined as the completion 
of C” (Q) with respect to the norm Il - lge. By this we 
mean that for every u € H” (Q) there exists a sequence 
{ugheen C C7 (Q) such that 


lim {lu — ugl gmo = 9 (128) 
k= 


We recall that two Cauchy sequences {u,} and {v,} in 
C™(Q) are said to be equivalent if and only if lim, , llug- 
villano = 0. This implies that H” (9), in fact, consists of 
all equivalence classes of Cauchy sequences and that the 
limit u in (128) is just a representative for the class of 


equivalent Cauchy sequences {u,}. The space H™(Q) is a 
Hilbert space with the inner product defined by 


Gime =>, { D*uD"v dx 
jsm” S 
(2) Sobolev space H*(Q) for 0<s€ R. By setting 


s=m+) withmeN, and 0< <1 


the Sobolev space H*(Q) of order s is defined as the 
completion of 


CS(Q) = {u € CHO) | hullas < 03 


with respect to the norm 


Well sca) = {tines 


1/2 
|D%u(x) — D®u(y)? 
ret i crs aa (129) 


jaj=m 


the so-called Slobodetskii norm. Note that the second part 
in the definition (129) of the norm ||- Ilas gives the 
L2-version of fractional differentiability, which is com- 
patible to the pointwise version in C"**(Q), the Hider 
m-continuously differentiable function space (cf. Section 
2.2). The space H°(Q) is again a Hilbert space with the 
inner product defined by 


(u, DERG := (U,V) ym@y + ey iy 


jal=m 
(D%u(x) — D®u(y))(D*v—) - D*vQ)) ay g 
x |x ae yet» y 


We remark tleat the Sobolev spaces H” (Q) and H° (9) can 
also be defined by using the theory of distributions (see e.g. 
Adams, 1975). Owing to Meyers and Serrin (1964), these 
two approaches will lead to spaces that are equivalent. We 
believe that the definition of Sobolev spaces based on the 
completion is more intuitive. 


(3) Sobolev spaces H*(T) and H~ (T) for O<seER 
For s = 0, we set H? (T) := L? (T). For s > 0, the simplest 
way to define the trace spaces on F is to use extensions 
of functions defined on T to functions in Sobolev spaces 
defined in Q. More precisely, let 


CS (T) = {u € C'T) | to u there exists x e HPQ) 


such that yo := Hip =u on F} 
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Then the natural trace space H5 (T) is defined by 
ED = CT 
the completion of C(I) with respect to the norm 
lzi gs = pme. IEI gro (130) 


We note that with this definition the inner product cannot be 
deduced from (130), although it can be shown that H° (T) 
is a Hilbert space. On the other hand, the trace theorem 
holds by definition, namely, 


HYotllzecry < llao for every Fe HOT? (Q) 


For s < 0, we can define the space H5 (T) as the dual of 
H~ (T) with respect to the L?(I)-duality (-, -); that is, the 
completion of L?(T'} with respect to the norm 


kull gery — sup HOR urami (131) 
lvlla-sm=1 


These are the boundary spaces of negative orders. On 
the other hand, the boundary F is usually identified with 
R”-! by means of local parametric representations of the 
boundary, and hence the trace spaces are defined to be 
isomorphic to the Sobolev space H(R""!), This means 
that the spaces H° (T'} behave like the spaces H* R"), For 
further discussion of this latter approach, see, for example, 
Nečas (1967) and Aubin (1972). However, for our purpose, 
the above definitions of trace spaces based on (130) and 
(131) will be sufficient. 

We are now in a position to specify the approximation 
spaces S, by using boundary elements. In particular, we 
assume that Sp = SÉ” is a regular boundary element space 
as introduced in Babuška and Aziz (1977). That is, sem 
with £,m € Ny and m +1 < £ has the following propcrtics: 


Definition 7 (Approximation property) Let t<s<¢ 
and t < m + (1/2) forn=2 ort <m for n=3. Then, 
there exists a constant c such that for any v € H° (T) a 
Sequence Xp, € soe exists and satisfies the estimate 


lv — Xall Sch ola (132) 
Definition 8 (Inverse property) For t <s < m+ (1/2) 
forn =2ort <s <m forn =3, there exists a constant M 


such that for all x, € S$", 


Dalar) < MK aleg (133) 


It can be shown that the approximation and inverse 
Properties together imply for the L?-projections P, and 


Q, the respective uniform bounds (116) and (119). For 
nonregular grids, in general, the inverse property will not 
be true anymore in this form, but might be replaced by 
(116) and (119), respectively. For the Galerkin solution ^, 
of (111) in sim with the properties (132) and (133), we 
then have the following error estimate. 


Theorem 11. For—1/2<t<s <£, t <m, we have the 
asymptotic error estimate of optimal order 


Wh Aall SCAT All gery (134) 


for the Galerkin solution ^, of (111), provided the exact 
solution \ € H’ (T). 


Proof. For t = —1/2, the estimate (134) follows imme- 
diately from Céa’s lemma (114) and the approximation 
property (132). That is, 


IA = Mallar- Sch ML asery (135) 


For t > —1/2, we need the inverse property (133) in addi- 
tion to the approximation property (132) and the established 
estimate (135): 
R — Mallam SD Kalle + IXa > dalla 
with x, for à, as in (132) 
< chi |r Her) + MKP IKa — Mall a- 
from (132) and (133) 
< chet ll HAT) + Mh 
x (i Xa 7 Alla- +d Mallee) 
sch Nl zs 
The last step follows from the approximation property (132) 
and the estimate (135) in the energy space H7/2(r). O 


For the estimate of the Galerkin solution pair (p, Op) 
of the equation (127), we note that the approach will be 
exactly the same if we modify the approximation property 
(132) and the inverse property (133), respectively, as 


lv — Xalam + |o — el £ ch ijv gsr (136) 
Walle +lk] < Mh {Ikala F EAN 
(137) 


with M’ := max{1, M}, where M is the constant in (133). 
We have the estimate in the energy space ((H = H! (F) x 
R), in this case) from Céa’s lemma (Theorem 10) by mak- 
ing use of the approximation property. Then for estimates 
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in stronger norms, the inverse property will be used. We 
summarize our results for the Galerkin solution of (127) in 
the following theorem. 


Theorem 12. Let H, := SE” x R have the approxima- 
tion property (136) and the inverse property (137). Then 
the Galerkin solution pair (Wp, ©,) for the equation (127) 
converges for 1/2 < t < $ < Landt < m asymptotically as 


lu = merlar + lo- ol < ch?" hull ery + lol} 
(138) 


4.3 Aubin—Nitsche lemma 


The estimates (134) and (138) are optimal with respect to 
the H'(T)-norm for a < t, where H*(T) with a = —1/2 
or 1/2 are the respective energy spaces. In order to obtain 
an optimal rate of convergence for t < a also, we need the 
so-called Aubin—Nitsche lemma for BIEs; see Hsiao and 
Wendland (1981a) and Costabel and Stephan (1988). The 
Aubin—Nitsche lemma is often referred to as the Nitsche’s 
trick when one uses FEMs to solve BVPs for PDEs. By 
applying this lemma, one also obtains the so-called superap- 
proximation of the approximate solution; see Nitsche (1968) 
and also Ciarlet (1978). The proof of the Aubin—Nitsche 
lemma is based on duality arguments. However, for this 
purpose, one needs some additional mapping properties of 
the boundary integral operators involved. 

To be more specific, let us return to the Galerkin equa- 
tions (111) associated with the simple-layer boundary inte- 
gral operator V defined by (60). What we need is the 
following mapping property in addition to the Garding 
inequality (103). The mapping 


V: H'T) > Hr) (139) 


is a continuous isomorphism for any t € R. By an isomor- 
phism, we mean that V is one-one and onto. Hence, by 
the continuity of V, the inverse V7! is also continuous. 
We have shown (139) for t = —1/2. In fact, this prop- 
erty is true and has been shown in Hsiao and Wendland 
(1977) in the plane and can be easily established by using 
the theory of pseudodifferential operators; see, for exam- 
ple, Hsiao and Wendland (2005). Now, under the condition 
(139), because V* = V is also a continuous isomorphism 
of the same order (see Taylor, 1981), that is, 


V=V*: HIT) > AP) 


is a continuous isomorphism for any t € R (see Figure 2). 


Vv 
HAT) —-e> Ho Y) 


| 


HT) ss HENT) 


Figure 2. Continuous isomorphisms. 


Hence, in particular, we have the estimate 
IXa- = CHV" xla- foreveryteR (140) 


Now let e, := \ — à, denote the error of the Galerkin 
solution of (111). Then for x € H~*"(P) with t < —1/2, 
we have 


lens v*x)| = (Vep Y= KV e, X— Xa) + (Ve,, Xn) 
= (Ven X = Xa)l 


following from the orthogonality of e, with respect to 
a, (.,-). The latter implies that 


len, V"x) < inf |(Ve,,x — Xn) 


ib 
KESK” 


= elles ll y-ray inf IX — Xa la-a 
KESK” 
—t-1/2 
< lerla h AP- 


for —t — 1 < £ from the approximation property (132). The 
estimates (135) and (140) imply that 


Hem VXI S HIN a VX 


from which the estimate 


lerla = sup ilen vi < ch IIM ery 
lvla- Sh 


follows for —1 — £ < t < —1/2. Thus, we have proved the 
following lemma. 


Lemma 7. The error estimate (134) remains valid also 
for —1—£ <t < —1/2, where we do not need the inverse 
Property. 


Now, for the equation (127), in order to extend the results 
of (138) for t < 1/2, we need the regularity results for the 
operator A defined by the sesquilinear form a, in (125) 
and an estimate similar to (140), but for the operator A* 
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adjoint to A, namely, the estimate 


10, Olly-Hayxr S 4*0, Olla-cyxg forr ER 
(141) 
These results are indeed known, and for a proof, see Hsiao 
and Wendland (2005). The corresponding Lemma 7 now 
reads: 


Lemma 8. The asymptotic error estimate in Theorem 12 
can be extended for 1 — £ < t < 1/2 < s and 1/2 < m with 
the additional regularity condition (141). 


The proof of Lemma 8 proceeds identically as that for 
Lemma 7. 

One would like to push the error estimates in Sobolev 
spaces of lower order as far as one can. The purpose for 
doing so is that in this way one may obtain some kind 
of superconvergence. For instance, if we substitute the 
Galerkin solution ^, of (111) into the boundary potential 
in (57), 


u(x) = [ee yyy) ds, for xE€Q 


then both u := V2 and u, = V ^, satisfy the PDE in (56). 
Moreover, we see that for any compact subset Q’ of 2, we 
have 


|u(x) — u(x) = 4 ia B@, D) 
<A Mlan lE, Vlag 
<e@h Neg for x €Q' EL 


where c(d) is a constant depending on d := sup{|x — y| for 
x €Q! and y € I}. By the same approach, it is easy to see 
that similar estimates hold for the derivatives as well. That 
is, we have the superconvergence results 


- |DPu(x) — Du) = OW), x EN EL 


fora = —1/2 and a = 1/2, respectively for the correspond- 
ing boundary potentials u in terms of ^ and pu and their 
Galerkin approximations u, (see e.g. Hsiao and Wendland, 
1981p). 


4.4 Stability and Tl-posedness 


As is well known, an integral equation of the first kind 
such as (58) is not well posed in the sense that small L? 
disturbance in the data » may produce arbitrarily large 
discrepancies in the solutions. On the other hand, ill- 
Posed problems are frequently treated by the regularization 
method as in Tikhonov and Arsenin (1977) and Natterer 


(1986). In order to see the connections, we allow the 
possibility of imprecise data and replace ọ in (56) by its 
perturbation ~,. We assume that 


lle — onary < € (142) 


holds, where € is a smal] parameter. Now we denote by 
Ai the corresponding Galerkin solution of (111) with » 
replaced by ¢,. Then, it follows from (117) that 


RDG = (Ady) + PEVP,) PEC — @) 


where P, is now the L?-projection satisfying (116). Con- 
sequently, we obtain the estimate 


Ik — Mill-we@y 
SHO A yaa + CPE — Plar 


from the uniform boundedness of P% VPJ We have 
obtained an estimate for the first term on the right-hand 
side. In order to use the information of (142), we need to 
employ the inverse property (133) to dominate the second 
term on the right-hand side by the L?-norm. This leads to 
the abstract error estimate in the form of (114), that is, 


IA = A$ la-m) S ef inf I XIlg-vary + nr} 
KES, 


from which general estimates can be obtained as before. Of 
particular interest is the L? estimate 


IA — Alr) < eth INg HAE} (143) 


for à € H' (T). The estimate (143) provides us some guid- 
ance concerning the choice of the mesh size 4 in numerical 
computations (see Hsiao, 1986). From (143), it is easy to 
see that for given e, there is an optimal choice of A, 


Rone = 86+) 
‘opt 
With this choice of h, we have 
IX = Mllzay = OCCT) as e— OF 


which coincides with the result obtained by the Tikhonov- 
regularization method as in Tikhonov and Arsenin (1977) 
and Natterer (1986), if the regularization parameter there 
is chosen optimally. Hence, for this type of problems, the 
discretization via Galerkin’s method is already an optimal 
regularization. 

It is also known that the L?-condition number of the 
Galerkin equation (111) is unbounded. This can be seen as 
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follows. Following Hsiao and Wendland (1977), Wendland 
(1983, Corollary 2.9), and Hsiao (1987), we may write from 
(118), 
Ap = (VP, EVN) = EVP, E, o) 
= Gyy vo (Pi, 9) 
where P, and G,y are the projection and the Galerkin oper- 


ator respectively. Applying Theorem 11 with à replaced by 
V-1(Pžọ) and s = t = 0, we obtain the estimate 


IV IERO — Mlm < eVI RA 


< ciPielam < cMh™ iP ollre 


from the inverse property (133). Then we have 


Ihale < VEED — Malie +IVA 
= hOB, ll za) 
for some constant c’, independent of h. The latter implies 
that 
WA, ss cn 
where A, := P} VP, denotes the coefficient matrix of the 


Galerkin equation (111) and || - Jj is the spectral norm here. 
On the other hand, we have from the continuity of V 


PAVE alley < Cll Macy S elhal 


which implies that 
IA, il < ¢ 


Hence the L?-condition number satisfies 
cond,2(A,) = WAall 4, I = OG) 


Similarly, we arrive at the estimates for the Galerkin 
solution of (127): 


uy, ©, )Mlzzqyxe £ Crh + eA, as Odile 


HA; (ys Ozzy ee < CAT My, O) 


where A, now denotes the stiffness matrix of the Galerkin 
equations of (127). This gives the estimate for the condition 
number of the Galerkin equations for (125) and, in the same 
manner, also for the Galerkin equations to (120) to (82); 
(121) or (123), (124) to (69) and those to (78) and (86); 
that is, 


condz2qy(A,) = Oh?) 


In all these cases, we see that the condition numbers will be 
unbounded as A — 0*. This is not surprising. We note that 
although (59) is a BIE of the second kind, its variational 
formulation (84) with the help of the operator D actually 
corresponds to the weak formulation of a BIE of the first 
kind, according to our classification in the next section. As 
will be seen, the results presented here for (58) and for (59) 
(or rather for its weak form (84)) are just special cases of 
a general result for BIEs of the first kind. We will return to 
this discussion in the next section. p 

However, if I is sufficiently smooth, as, for example, in 
C? (or at least Lyapunov), then, in case of the Laplacian 
or the Lamé system, the integral operators of the second 
kind in (59) and in (75) are bounded in Ł?(T) and satisfy 
Girding inequalities there of the form 


Re(u, (LIE K) + Cb) pr = tslua 04 


for all , € L?(T). Then one may apply the Galerkin method 
for (59) or (75) directly in L7(T), and the corresponding 
condition numbers will be uniformly bounded on appropri- 
ate boundary element spaces, which is very advantageous 
for fast iterative solution methods. Therefore, based on the 
identities (24), one tries to find preconditioning operators 
for the BIEs of the first kind, as in (34) or (46), to be 
converted into integral equations of the second kind (see 
Steinbach and Wendland, 1998). Unfortunately, for general 
Lipschitz boundaries I’, inequalities of the type (144) are 
not known and might not even be true anymore. 


5 THE ROLE OF SOBOLEV INDEX 


This section is the heart of the chapter. After presenting 
the concrete examples in the previous sections, it is the 
purpose of this section to discuss the BEM from a more gen- 
eral point of view. Fundamental concepts and underlying 
mathematical principles, which may have already appeared 
in different forms in the previous special model problems, 
will be addressed once more. General results, which contain 
those obtained in the previous examples as special cases, 
for the Galerkin-BEM will be collected in this section. 


5.1 Order and classifications 


We begin with the classification of BIEs based on the 
mapping properties of the BIE operators involved. Let us 
consider a BIE of the general form 


Ac=f on? (145) 
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Here, A is the operator on F and o is the unknown boundary 
charge (unknown density or moment function), and f is 
the given data function on the boundary T. We assume 
that f € H*-"(T), s e R, where 2a is a fixed constant. 
(It is assumed that the boundary manifold F is sufficiently 
smooth for the corresponding s and « to be specified.) 


Definition 9. We say that the order of the BIE operator A 
is 2a if the mapping 


A: ET) > HP) 


for any s € R with |s| < sọ is continuous, where sy is some 
fixed positive constant. 


For example, if the direct method is used for solving 
the interior and exterior Dirichlet and Neumann problems 
for the Lamé equations (8) and (11), we will arrive at the 
following typical boundary integral operators: 


2a=-1, A=V 
Qa= 0, A=jILK orA=}I7K' 
2a=+1, A=D 


Here, V, K, K’, and D are the four basic BIE operators 
introduced in (20), (21), (22), and (23) respectively. Bound- 
ary integral equations such as (145) are classified according 
to the order 2a of the boundary integral operator A. 

We call (145) a first-kind BIE if 2a < 0. If 2a = 0, the 
operator A is of the form al + K, where K is either a 
Cauchy-singular integral operator or K is compact and 
a #0. The latter defines a Fredholm integral equation of 
the second kind, while the former defines a CSIE. In case 
2a>0,A=L-+K, where L is a differential operator and 
K a possibly hypersingular integral operator. If the order of 
L is equal to 2a > 0, then A defines an integrodifferential 
equation. Otherwise, if the order of L is less than 2a > 0, 
we have a so-called hypersingular integral equation of 
the first kind. In the elasticity, (33), (38), (43), and (44) 
are all CSIEs. The equation (59) is a genuine Fredholm 
integral equation of the second kind for smooth boundary 
P with a compact operator K. Table 1 gives a quick 
view of the classification of BIEs based on the mapping 
properties of A as well as applications of BIEs in various 
fields, 

Weak formulations for the BIEs are generally different 
for the first- and second-kind equations. In the former, the 
boundary sesquilinear forms are connected with domain 
Sesquilinear forms for the PDEs in the interior as well 
as in the exterior domain, while in the latter, it connects 
only with the sesquilinear form either for the interior or for 
the exterior domain, but not both, depending on the direct 
or indirect approach. Also, as we have seen in the model 


Table 1. Classifications and applications of BIEs. 


Classifications Applications 

Fredholm IE of the second kind Potential flow problems, 

2a=0 viscous flows, acoustics, 

(e.g. 59) Darcy flows, 
electromagnetic fields, . .. 

CSIE Elasticity and 

2a=0 thermoelasticity, geodesy, 

{e.g. 33 and 37) subsonic compressible 
flows, ... 


Fredholm IE of the first kind Conformal mappings, 

2a <0 viscous flows, acoustics, 

(e.g. 32 with 2a = —1) electromagnetic fields, 
elasticity and 


thermoelasticity, ... 
Hypersingular TE Acoustics, elasticity and 
2a > 0 thermoelasticity, wings, 


(e.g. 38 with 20 = 1) coupling of BEM and 


FEM, crack problems, ... 


problem, for the second-kind BIEs, a premultiplied opera- 
tor, as in Gatica and Hsiao (1994), is needed in order to 
give the appropriate duality pairing in the variational for- 
mulations for the BIEs (see (69)). As we have seen from 
the model problems, for the BIE (145), whose sesquilin- 
ear form coincides with the variational sesquilinear form 
of the BVP, the strong ellipticity of the boundary integral 
operators in the form of Garding inequalities for the cor- 
responding boundary integral operators in the trace space 
on the boundary manifold will be a consequence of strong 
ellipticity of the original B VPs (see Costabel and Wendland, 
1986). 


5.2 Consistency, stability, and convergence 


Consistency, stability, and convergence are the basic con- 
cepts in any numerical approximating scheme. The well- 
known general principle, known as the Lax equivalence 
theorem, states that 


consistency + stability ===> convergence 


which applies to BEMs without any exception. In fact, 
Céa’s lemma for the Galerkin-BEM is indeed a classical 
convergence theorem based on the complementary con- 
cepts of consistency and stability. In the following, let 
us examine the Galerkin method for the boundary integral 
equation (145), 

Let H = H°(I) denote the solution space of (145), and 
Hy C H be a one-parameter family of finite-dimensional 
subspaces of H. For convenience, we formulate the Galerkin 
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method for solving (145) in the following form: For o € H, 
find op, € Hp such that 


Or (On, Xn) = (Ap, Ww = (Ao, twa 
forall x, eH, (146) 


The formulation (146) is of course equivalent to the stan- 
dard one if o € H is the exact solution of (145). For the 
time being, we assume that the following conditions hold: 


(1) Consistency: Let A,:H — H’ be a family of con- 
tinuous mappings approximating the operator A. The 
operators A, are said to be consistent with A if for 
every v € H there holds 


3 2 = re 
jim láv Avl = 0 as h > 0 


(2) A priori bound: For all O < h < họ, there exits a 
constant Cy = Co (ko) independent of o such that 


lor leg < eollolly 


For our Galerkin method (146), we have P} A, = P} AP,,. 
Hence, consistency condition (1) is a consequence of the 
approximation property (110) of the sequence, while (2) is a 
stability condition for the family of approximate solutions. 
From condition (2), we see that if o = 0, then o, = 0. This 
means that the homogeneous equation 


(Aon, Xadra =O forall X, €H, (147) 


has only the trivial solution. Since (147) is equivalent 
to a quadratic system of linear equations in terms of a 
basis of H,, this implies the unique solvability of the 
inhomogeneous equation (146) for every A with O < k < 
ho. Condition (2) also implies that there is a mapping 


GH > o o EH, CH 
such that G, is uniformly bounded, that is, 
IShI < co (148) 
Moreover, we see that Gio = G0, = op = Gho, the sec- 
ond equality following from the unique solvability of 
(146). Hence, G, is the Galerkin projection introduced in 
Section 4. 


Now, from (148), we see that 


Ast 2 ga~! 


is uniformly bounded, provided A`! is bounded. Conse- 
quently, 


lo — orll S CARS — Aror llaw 


= c| 4,0 — Aol > 0 ash—0 (149) 


as expected under Condition (1). Hence, as usual, the 
stability condition (2) plays a fundamental role in the 
abstract error estimates. We will show that stability con- 
dition (2) for the Galerkin-BEM (146) can be replaced by 
the well-known Ladyzenskaya—Babuska—Brezzi condition 
(BBL-condition), also called inf-sup condition, a condition 
that plays a fundamental role in the study of elliptic BVPs 
with constraints as well as in the analysis of convergence 
and stability of FEMs and is most familiar to the researchers 
in the FEM analysis; see Nečas (1962) and Babuška and 
Aziz (1977). 

We recall that a sesquilinear form B(., -):H,; x H, > C 
on Hilbert spaces H; and H, is said to satisfy the BBL- 
condition or inf-sup condition if there exists a constant 
Yo > O such that 


IB, v)| 


inf m > Yo 
OFUEH: Oven, Nell, Hello, 


For our purpose, we consider the special discrete form of 
the BBL-condition with both #, and H, replaced by H and 
the sesquilinear B(-,-) form by the boundary sesquilinear 
form apk, +). 


Definition 10 (The BBL-condition) There exists a con- 
stant Yo > Ô such that 


lap(u,, Xal 


> Yollvrly forall vu, €H, 
oxe IXa lla 0, H h h 


(150) 


Theorem 13. Ifthe BBL-condition holds, then the Galerkin 
equations (146) are uniquely solvable for each o € H, and 
we have the quasioptimal error estimate 


oro <c inf |o- 
lo — orlu Se ink, lo = Xall 


where the constant c is independent of o and h. 


Proof. If ap (0n, Xa) = 0, then the BBL-condition (150) to 
(146) with v, = o, yields o, = 0. Then uniqueness implies 
the unique solvability of (146) for every h > 0, since (146) 
is equivalent to a quadratic system of linear equations. 
Consequently, there the mapping o > o, =: Gho is well 
defined, and furthermore, G, is a projection from H onto 
H, C H by the same argument as before. It remains now 
to show that G, is uniformly bounded. This proceeds as 
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follows: For any o €e H, let o, correspond to its exact 
Galerkin solution of (146). Then we see that 


1 lar (0p, Xal 
lGroln = lorla < — ee 
ae ps Yo OF xn€Hn Wa ln, 


1 

TE oe WI 
1 : 

= —lap(o, XK < cllolly 
Yo 


by using the continuity of A. Here x7 € H, with |x} llu = 
1 denotes where the supremum on the finite-dimensional 
unit sphere is attained. The latter implies 


Gn ole = 


Gl := su 
Nal rg loln 


that is, (148), which implies (149) (see Chapter 9, this 
Volume). o 


5.3 The BBL-condition and Gårding’s inequality 
We have seen that the BBL-condition also plays an impor- 
tant role in the analysis of convergence and stability for the 


BEM. In the following, we would like to show that 


Garding’s inequality + uniqueness + (ap) 
= BBL — condition 


` We need the definition of the Garding inequality for the 


boundary integral operator A of (145) in the form of (88). 


Definition 11 (The Garding inequality) The boundary 
integral operator A is said to satisfy a Garding inequality, 
if there exists a compact operator C:H —> H' and positive 
constant y such that the inequality 


Refap(v, v) + (Cv, v)p} = yilul = 51) 
holds for all v € H, where H' denotes the dual of H. 


Theorem 14. Suppose that the boundary sesquilinear form 
apl, -) satisfies the Gdrding’s inequality and 
Ker(ap) := {09 € Hap (oo, X) = 0 
for all x € H} = {0} 
Then, ar satisfies the BBL-condition, provided H, satisfies 
the approximation property (ap). 


Proof. Our proof follows those in Wendland (1987) (see 
also Wendland, 1990). From the definition of the Garding 


inequality (151), if we Jet B := A+ C, then B will be H- 
elliptic. We consider two cases: 

(i) C=0 in (151). Then A is H-elliptic. If we let 
Xa = Vp then 

lap (Vp; Xn) = lap (Vp, V;,)| = Re ap (Up, Va) Z ylva% 


= yllo loll Xa lla 
This implies the BBL-condition 


lar (Vh, Xa)! 


= vilValloe 
OFX EH a lz 


(2) C #0 in (151). Let G,, be the Galerkin projection 
corresponding to the operator B, which is H-elliptic. Then, 
Gag — I elementwise follows from the approximation 
property (ap). Now let 


Li- OB E and Li=I-B'C=B'A 
Then we have 
L—L, = (Gag — I)B'C 


and 
IZ—-L,||| > 0 fork — 0 


since B7! exists and is bounded, B-!C is compact. 

Since L7? = A~B, the latter implies that L;,‘ exists and, 
moreover, there exists a constant cy independent of h for 
0 < h < ho such that 


IIZI < co 


that is, Lz! is uniformly bounded. Now a simple manipu- 
lation yields 


Re ap (Up Xa) = Re (Av, Xn) = Re (BL Uh Xn) 
— Re (B(L — Lp)Vr Xa? 
from which we obtain 


lap (Vr Xal + MBL — Lip Xa)! > Re (BL,v,, Xa) 
(152) 
Since B is H-elliptic, if we put x, = L;,v,, the right-hand 
side of (152) is bounded as shown below: 


Re (BLpUn Xn) = YollLaVa lye = YollLaYallaallXallrs 


> ty tellXallee 
co 


the last step following from the uniform boundedness of 
Lz. The second term on the left-hand side of (152) is 
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dominated by 


KBE — Lh) Vn Xal S BI IZ — LAAD Wry lag Xa lla 


Collecting the terms and substituting into (152), we arrive 
at the estimate 


lanon Rad] & [Z = MBI ME = Lal Replat 


(153) 
Since || — L,||] + 0, for A — 0, there is a constant hy > 
0 and y > 0 independent of h for 0 < h < ho such that 


[2 -HBn ML- Lli} =v > 0 
Co 


The BBL-condition then foliows immediately from (153). 
This completes the proof. ig 


If the BEM is used for approximating the Dirichlet to 
Neumann map (or its pseudoinverse), then one needs to 
approximate all the Cauchy data. Then one also needs BBL- 
conditions between these approximating spaces in some 
cases. For details, see the book by Steinbach (2002). 


5.4 Asymptotic error estimates 


In this section, we collect some general results concerning 
the error estimates for the approximate solutions of (145) 
by the Galerkin-BEM obtained by the authors over the 
years. These results contain those presented in the previous 
sections for the model problems as special cases. We 
consider the boundary integral operator A of order 2a, 
as in (145), and assume that the following assumptions 
hold: 


(1) The boundary integral operator 
A: HSt*(P) > HSL) 


defines a continuous isomorphism for any s € R with 
|s| < sọ > O, that is, A is bijective and both A and 
A`! are continuous. 

(2) The operator A satisfies a Garding inequality of the 
form (151) with H = H*(P) being the energy space 
for the operator A. 

G) Let H, = Se" CH with £,meNy and m <£- 
1, that is, a regular boundary element space with 
approximation property and inverse property given by 
(132) and (133) respectively. 


The following results have been established in Hsiao and 
Wendland (1977, 1985). 


Theorem 15. Under the above assumptions, let m > a — 
1/2 for n =2 or m > a for n =3 and sy > max{£, |2a — 
£i). Then we have the asymptotic error estimate of optimal 
order 


lo — Olay £ ch lolam (154) 


forla-f<t<s<lt<m+1/2forn =2ort <m for 
n = 3, and a < s. Moreover, the condition number of the 
Galerkin equations (146) is of order O(h”). 


We note that to establish the estimates in (154) for 
2a-£<t<a<s <£, we only need the approximation 
property of H, together with the duality arguments. On 
the other hand, for a <t <s <£, we need, in addition 
to the approximation property, the inverse property of H, 
also. 

For regular boundary element subspaces, (154) indicates 
that the rate of convergence is given by the exponent 
s —t, which is restricted by two fixed indices £ and m. 
The former indicates the degree of the complete polyno- 
mials chosen for the boundary element basis functions, 
while the latter governs the Sobolev index for the regu- 
larity of the solution o. Hence, for smooth boundary, if 
we have a smooth datum f € H°-?*(P), then we have 
o € H° (T) from the regularity theory of pseudodifferen- 
tial operators. This means that if the solution o is suffi- 
ciently regular, we can always increase the rate of con- 
vergence by increasing the degree of the polynomials in 
the boundary element basis functions. On the other hand, 
for nonsmooth boundary, the regularity of the solution is 
restricted, even for given smooth data. In this case, the 
rate of convergence is completely unaffected, no matter 
how large the degree of polynomials used in the boundary 
element basis. Therefore, one needs more general-function 
spaces together with graded meshes that are not considered 
here. 


Theorem 16. Under the same conditions as in Theo- 
rem 15, if the datum f is replaced by its L?-perturbation 
fe then for a < 0, we have the modified error estimate 


i - 2 
E T FAO LF A gory} 


FIF — fella S & then the choice of h given by 


ho 


ot = e with := 


s + [2a] 


yields the optimal rate of convergence: 


lo ~ Of lig = OCEHNETPD) aseo" (155) 
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We remark that the results in (155) are in agreement 
with those obtained by the Tikhonov-regularization tech- 
nique; see Tikhonov and Arsenin (1977) and Natterer 
(1986). 

Finally, we note that as long as the order of the boundary 
operator is not zero, the condition numbers of the dis- 
crete equations are unbounded, independent of the sign 
of a. Hence, in order to use iteration schemes for solv- 
ing the discrete equations, a suitable preconditioner must 
be employed. On the other hand, for operators of negative 
orders, although the equations are ill posed, the conver- 
gence of the approximate solutions can still be achieved in 
view of (155). 


6 CONCLUDING REMARKS 


This chapter gives an overview of the Galerkin-BEM 
procedure as indicated in Figure 1 by using elementary 
model problems. Because of the limitation of the chap- 
ter length, we have confined ourselves to smooth closed 
boundary manifolds and omitted numerical experiments 
as originally planned. However, typical numerical exper- 
iments are available in Hsiao, Kopp and Wendland (1980, 
1984) for illustrating the efficiency of the scheme. For 
nonsmooth boundary and open surfaces, see, for exam- 
ple, Costabel and Stephan (1985), Stephan (1987), Costabel 
(1988), Stephan and Wendland (1990), Hsiao, Stephan and 
Wendland (1991), Hsiao, Schnack and Wendland (2000), 
and Steinbach (2002). General error estimates for the 
collocation-BEM (for n = 2) can be found in the funda- 
mental paper by Arnold and Wendland (1983); see also the 
books by Préssdorf and Silbermann (1991) and by Saranen 
and Vainikko (2002). For applications, see, for example, 
Wendland (1997). 

Most of the material presented in this chapter is based 
on general results in a monograph, which is presently being 
prepared by the authors; see Hsiao and Wendland (2005) 
and an earlier survey by Wendland (1987). To conclude 
the chapter, we remark that BIEs can also be understood 
to be standard pseudodifferential operators on the com- 
pact boundary manifold F (see Seeley, 1969). Then the 
Garding inequality is equivalent to the positive definite- 
ness of the principal symbol of pseudodifferential operators. 
For particularly chosen BIEs such as the ones for our 
model problems, the Garding inequality can be obtained 
from the coerciveness of the original strongly elliptic BVP. 
For the convergence of the Galerkin and the collocation 
BEMs, we only require the principal symbol to be posi- 
tive definite modulo multiplications by regular functions or 
matrices. This is the definition of strong ellipticity intro- 
duced in Stephan and Wendland (1976) for the systems 


of pseudodifferential equations. Details of these concepts 
and relations will be available in the monograph Hsiao and 
Wendland (2005), which also contains rather complete the- 
oretical results for a class of boundary integral operators 
having symbols of rational type (see also McLean, 2000). 
The latter cover almost all boundary integral operators in 
applications. 
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1 INTRODUCTION 


The coupling of boundary element methods (BEM) with 
finite element methods (FEM) has become a very pow- 
erful and popular method in applications, and there exist 
detailed descriptions of implementations of such methods 
(see Brebbia, Telles and Wrobel (1984) and the references 
given there), Within engineering computations, the BEM 
and the FEM are well-established tools for the numeri- 
cal approximation of real-life problems in which analytical 
solutions are mostly unknown or available under unrealistic 
modeling only. 

Both methods are somehow complementary: the FEM 
seems to be more general and applicable to essentially 
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nonlinear problems, while the BEM is restricted to cer- 
tain linear problems with constant coefficients. On the other 
hand, the FEM requires a bounded domain, while the BEM 
models an unbounded exterior body as well. Applications 
occur in scattering problems, elastodynamics, electromag- 
netism, and elasticity. The general setting of these problems 
is that we wish to solve a given differential equation in two 
adjacent domains subject to a specified interface condition. 
Most often, we have a bounded region Q surrounded by an 
unbounded region, with the interface condition being spec- 
ified on the shared boundary. Typically, in this ‘marriage 4 
la mode’ (see Bettess, Kelly and Zienkiewicz, 1977, 1979), 
an FEM formulation is used to describe the solution within 
the bounded region — where the differential equation can be 
nonlinear — and the BEM is used to represent the exterior 
solution. 

The purpose of the present note is to give an overview 
of FE/BE coupling procedures for elliptic boundary value 
problems. Special emphasis is given to describe the 
symmetric coupling method, which was independently 
proposed and analyzed for linear transmission problems 
by Costabel and Stephan (see Costabel (1988b) for 
general strongly elliptic second-order systems, see Costabel 
(1987) for higher-order systems, and see Costabel and 
Stephan (1988a) and Costabel and Stephan (1988b) for 
the case of scattering of elastic waves) and by Han 
(see Han, 1990); later, this method was extended to 
interface problems with nonlinear differential equations in 
Q (see Costabel and Stephan, 1990; Gatica and Hsiao, 
1989, 1990, 1995). This symmetric coupling method 
(see Section 2) has been described in the engineering 
literature for problems from elasticity and elastoplasticity in 
Polizzotto (1987) and allows a variational formulation 
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in which the solution satisfies a saddle-point problem. In 
this method, the set of boundary integral operators consists 
of the single-layer potential, the double-layer potential, 
and their normal derivatives. The computed Galerkin 
approximations, consisting of finite elements (FE) in Q 
coupled with boundary elements (BE) on the Lipschitz 
continuous interface boundary I’, converge quasioptimally 
in the energy norm. This is an advantage over other 
coupling methods of a different structure; see Bielak and 
MacCamy (1983) and Johnson and Nedelec (1980). These 
coupling methods cannot guarantee convergence of the 
Galerkin solutions for nonlinear interface problems, for 
example, with Hencky-von Mises—type material considered 
in Section 6.1. As against these coupling methods, one uses 
the hypersingular operator in the symmetric method; this 
operator is given by the normal derivative of the double- 
layer potential. Although traditionally avoided if possible, 
this operator is now considered as one of the classical 
boundary integral operators and it appears frequently in the 
treatment of crack problems. It is known that the apparent 
difficulties in the numerical treatment of this operator can 
be easily overcome; see Mane (1949) and Nedelec (1982). 
In Section 2.1, we present a mathematical analysis of 
the symmetric coupling method based on the strong coer- 
civeness of the Steklov—Poincaré operator and its discrete 
analogue. Adaptive versions of the FE/BE coupling method 
are presented using error indicators of residual type and 
those based on hierarchical two-level subspace decompo- 
sitions. Also, implicit estimators are given in which local 
Neumann problems in the FEM domain are solved. 
Another prime topic of this article is the iterative solution 
of the discrete systems resulting from the above symmet- 
ric FE/BE coupling. These systems have block structured 
symmetric and indefinite matrices. An important approach 
to the construction of a fast solver has been the use of 
the Schur complement (see Bramble and Pasciak, 1988) 
in connection with domain decomposition techniques (see 
Langer, 1994; Carstensen, Kuhn and Langer, 1998; and 
Steinbach and Wendland, 1997). As the symmetric cou- 
pling method leads to a symmetric and positive definite 
Schur complement, it can be treated by very fast standard 
preconditioners. In order to avoid the direct computation of 
the inverse of the discretization of the single-layer poten- 
tial, which is involved in the Schur complement, one might 
turn to nested inner—outer iterations (see Hahne, Stephan 
and Thies, 1995). The alternative we propose here is to 
use the minimum or conjugate residual method (MINRES), 
which is an ‘optimal’ solver belonging to the family of 
Krylov-subspace (conjugate-gradient-like) iterative meth- 
ods. In Section 3, we consider the uniform hp-version and 
apply MINRES with multilevel preconditioning for the 
FE block and the discretized version of the single-layer 


potential. We identify the spectral properties of the com- 
ponents of the system matrix and give estimates for the 
iteration numbers and comment on implementation issues. 
As in Heuer, Maischak and Stephan (1999), we show 
that the eigenvalues of the preconditioned Galerkin matrix 
are appropriately bounded in case of the two-block pre- 
conditioner or depend mildly on the grid size h and the 
polynomial degree p of the trial functions. Especially, the 
use of additive Schwarz preconditioners leads to efficient 
solvers. Other good choices as preconditioners are multigrid 
(MG), BPX, or those using hierarchical bases. Alterna- 
tively, by simple scaling of BE test functions, we obtain 
a system with nonsymmetric Galerkin matrix with positive 
(semi-)definite symmetric part, which can be solved by the 
generalized minimal residual method (GMRES). 

A current research topic is the least squares coupling 
of FEs and BEs that allows to deal with mixed FE for- 
mulations without demanding inf—sup conditions to hold. 
This recent development is considered in Section 4. An 
increasing interest has evolved in applying mixed meth- 
ods instead of usual FEM together with either boundary 
integral equations or Dirichlet-to-Neumann (DtN) map- 
pings; see also Hsiao, Steinbach and Wendland (2000) 
and Steinbach (2003). Often in applications, a mixed FEM 
is more beneficial than the standard FEM, for exam- 
ple, in structural mechanics via mixed methods, stresses 
are computed more accurately than displacements. How- 
ever, in such mixed FE/BE coupling methods, it is often 
difficult to work with finite element spaces that sat- 
isfy appropriate discrete inf—sup conditions. On the other 
hand, least squares formulations do not require inf—sup 
conditions to be satisfied. Therefore, they are especially 
attractive to use in combination with mixed formulations. 
Recently, in Bramble, Lazarov and Pasciak (1997), a least 
squares functional was introduced, involving a discrete 
inner product related to the inner product in the Sobolev 
space of order —1. This approach is extended to a least 
squares coupling with boundary integral operators in Gat- 
ica, Harbrecht and Schneider (2001) and Maischak and 
Stephan. Discrete versions of the inner products in the 
Sobolev spaces H-!(Q) and H/?(I) are constructed by 
applying multigrid (MG) or BPX to both FE and BE 
discretizations, 

Another research topic is the FE/BE coupling 
for Signorini-type interface problems, which leads to 
variational inequalities with boundary integral operators. 
For the BEM applied to variational inequalities, see Hsiao 
and Han (1988), Gwinner and Stephan (1993), Spann 
(1993), and Eck et al. (2003). In Section 5, we propose 
two approaches for the FE/BE coupling for Signorini- 
type interface problems, namely, a primal method and 


Coupling of Boundary Element Methods and Finite Element Methods 377 


a dual-mixed method. For the h-version of the primal- 
coupling method, existence, uniqueness, and convergence 
results were obtained by Carstensen and Gwinner (1997). 
Maischak (2001) extends their approach to the hp- 
version and derives a posteriori error estimates based 


i on hierarchical subspace decompositions for FE and 


BE, similar to Mund and Stephan (1999) for nonlinear 
transmission problems. A dual-mixed coupling method 
for the h-version, which heavily uses the inverse of the 
Steklov—Poincaré operator and its discrete versions, is 
analyzed in Maischak (2001) (see also Gatica, Maischak 
and Stephan, 2003). Both primal- and dual-mixed coupling 
methods are described in this article for the scalar 
case; the extension to corresponding elasticity problems 
with Signorini-type interface conditions involves no 
major difficulties. FE/BE coupling procedures for friction 
problems are currently under investigation and are not 
considered here. As system solvers for the discretized 
variational inequalities, we propose a preconditioned 
Polyak algorithm in the case of the primal method and a 
preconditioned modified Uzawa algorithm in the case of the 
dual-mixed method. Both solvers lead to efficient numerical 
procedures, even for adaptively refined meshes. 

Section 6 deals with the applications of FE/BE cou- 
pling methods to elasticity problems. Firstly, we describe 
the symmetric FE/BE coupling method for a nonlinear 
Hencky-von Mises stress—strain relation. Here, we com- 
ment on the saddle-point structure of the symmetric cou- 
pling method. In the symmetric coupling method, the dis- 
placements in the interior domain are H!-regular and the 
equilibrium of tractions across the interface is satisfied 
weakly. In applications, however, the stresses are often 
more important to determine than displacements. Then, the 
coupling of BEs and mixed FEs, in which approximations 
to stresses can be determined directly, is more adequate; 
see Brink, Carstensen and Stein (1996) and Meddahi et al. 
(1996). Secondly, therefore, following Brink, Carstensen 
and Stein (1996), we present here a dual-mixed formula- 
tion for the finite element part Q, of a model problem 
in plane linear elasticity from Gatica, Heuer and Stephan 
(2001). This means that the stresses ø are required to sat- 
isfy o € [L2(Q,) 2? and div o € [L7(Q,,)}’, whereas the 
displacements are only sought in [L?(Q,-)]*. Thus, the 
approximate tractions are continuous across element sides 
and the interface boundary, while the displacements are 


` continuous only in a weak sense across element sides and 


the interface boundary. We present an a posteriori error 
estimator, which is based on the solution of local ellip- 
tic problems with Dirichlet boundary conditions. We note 
that the a posteriori error estimate can be derived in an 
analogous way for three-dimensional elasticity problems. 
For problems with nonlinearities, however, the inversion 


of the elasticity tensor C is not adequate. In that case, a 
different so-called dual—dual formulation of the problem 
can be used (see Section 7), in which further references 
are also given to other topics not explicitly considered 
here. 


2 SYMMETRIC COUPLING OF 
STANDARD FINITE ELEMENTS AND 
BOUNDARY ELEMENTS 


For a model interface problem, we present a combined 
approach with FE and BE. The given symmetric coupling 
method renders all boundary conditions on the interface 
manifold F to be natural and also allows for a nonlin- 
ear elliptic differential operator in the bounded domain 94. 
Our solution procedure makes use of an integral equation 
method for the exterior problem and of an energy (vari- 
ational) method for the interior problem and consists of 
coupling both methods via the transmission conditions on 
the interface. We give an equivalence result for the solu- 
tion satisfying a weak coupling formulation. We solve this 
variational formulation with the Galerkin method using FE 
in Q, and BE on T. As shown in Wendland (1988) and 
Costabel and Stephan (1990), we have convergence and 
quasioptimality of the Galerkin error in the energy norm. 
At the end of this section, we comment on the exponential 
convergence rate of the hp-version of the coupling pro- 
cedure when an appropriate geometric mesh refinement is 
used. 

Let Q, := Q C RË, d > 2 be a bounded domain with 
Lipschitz boundary T = 89), and Q, := IR*\&, with nor- 
mal n on F pointing into Q,. For given f € L7(Q)), 
uy € HVC), tọ € HL), we consider the following 
model interface problem (IP). 

Problem (IP): Find u, € H1(Q,), u, € HL. (Q,) such 
that 


—divA(Vu,)=f inQ, (1) 

Au, =0 inQ, 2) 

u; =u, +u onl 3) 

A(Vuy) n= oa +t on? @) 
blog |x| +001), d=2 

u(x) = ae. Gage i> 6) 


where b € IR is a constant (depending on u,). The operator 
A is assumed to be uniformly monotone and Lipschitz 
continuous, that is, there exist positive constants a and C 
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such that for all n, t € L? (2)? 


Í Um- An) -Ddata © 
Q 


IAG) - Alloa £ Clin — tlle (e 


Here |l- llo, denotes the norm in L7(S)*. Examples of 
operators of this type can be found in Stephan (1992) and 
for elasticity in Zeidler (1988, Section 62). 

The definition of the Sobolev spaces is as usual: 


E° (Q) = (bla; pE HRI) (6ER) 
{olp; b € H'tZ(R5)} (s>0) 

HT) = { LT) (s =0) 
(H-S(T)Y (dual space) (s < 0) 


In the following, we often write ||-||, , for the Sobolev norm 
II-llzecay With B = Q or F: 

We now derive the symmetric coupling method (see 
Costabel, 1988b; Costabel and Stephan, 1990) as discussed 
in detail in Costabel, Ervin, Stephan (1991). By using 
Green’s formula together with the decaying condition (5), 
one is led to the representation formula for the solution in 
the exterior domain u, of (2) 


a 
u(x) = f Eog ypu) 


-6¢,y) za) dy, xea ©) 


with the fundamental solution of the Laplacian given by 


-+ logļx — yl, d=2 
ee i ©) 
—|x—yl*, d23 
Og 


where we have œw, = 2n, wz = 41. 
By using the boundary integral operators 


vy) =2 f GOWO, xer (10) 
a 
K(x) :=2 | —G, ds, xer an 
wo =2 f pO VO) Sy x 
Kye) = 2È Í GE, y¥O)ds,, xer a2) 
an, r 


a a 
Ws) = 25 T PIENO Uy xer 0 


together with their well-known jump conditions, we obtain 
from (8) the following integral equations: 


ðu _ _ gn?” 
T Wu, + (I Ky Ba 
du, 
0=(— Kyu t+ V (15) 


In order to give an equivalent formulation for (IP), we use 
from Costabel (1988a) the following mapping properties 
of the boundary integral operators in which the duality 
(-,-) between the spaces H'/*(P) and H71/?(P) extends 
the scalar product in L? (T). 


Lemma 1. (a) Let T = 3X be a Lipschitz boundary. The 
operators 


V: HT) — Hr) 
K: WPT) —> Hr) 
K's HYT) aR HPT) 

W: HT) — HPT) 


are continuous. Moreover, the single-layer potential V and 
the hypersingular operator W are symmetric; the double- 
layer potential K has the dual K’. 

(b) For d = 2 and provided the capacity of T, cap(T), 
is less than 1, or d = 3, then V:H7/*(T) > H"? (C) is 
positive definite, that is, there is a constant a > 0 such that 


la Vo) z alelinr Voe Hr) A6 


(c) The kernel of the operator W consists of the constant 
functions. W is positive semidefinite, that is, 


(w, Wo) = 0 Yve H'PUT) a7 


Remark 1. For the definition of cap(T), we refer to Gaier 
(1976) and Sloan and Spence (1988), and we only mention 
here that if Q lies in a ball with radius less than 1, for 
example, then cap([) < 1. Thus, cap(P) < 1 can always 
be achieved by scaling; see Hsiao and Wendland (1977) 
and Stephan and Wendland (1984). 


Next, we derive the symmetric coupling method for (IP). 
One observes that (15) forms one part of the weak 
formulation of problem (IP) 


ð 
= (u4) ~ (v5, v)+ (Ku, 6) 


= (up, Y) + (Kup Y) Vee HTT) (18) 
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The second part of the weak formulation has to couple the 
exterior problem (2) and the interior problem (1). The weak 
formulation of the latter is 


a(uy, v) =f A(Vu,): Vv dx = f Au) mods 
Qı r 
+f fudx Vue H'(Q,) (19) 
Ni 


Taking the integral equation (14) and substituting (3) and 
(4) into (19), one obtains 


on ðn 
+2(t, 0) + (Wup v) Yv EHQ) 20 


2a(u,, v) — (, o) +(x, o) + (Wu, v) = 2(f, v) 


where (f, v) = Joy, fudx. There holds the following 
equivalence. 


Theorem 1. Ifu, and u, solve (1)—(5), then u; and du,/an 
satisfy (18) and (20). Conversely, provided u, and du,/dn 
solve (18) and (20), then u, and uy, which is defined by (8), 
(3), and (4), are the solutions of the interface problem (IP). 


Note that in this way we obtain the following variational 
formulation. 

Find u := u, € H1(Q,) and o := (8u,/dn) € Hr) 
such that for all v € H!(Q) and Y € H-'(r) 


2a(u, v) + ((K' — Do, v) + (Wu, v) 
= 2(to, V) + (Wup, v) + 2(f, v) 
(K — Du, p) — (Vo, p) = (K — Duy, Y) (21) 


For the Galerkin scheme, we choose finite-dimensional 
subspaces Xy C H!(Q) and Yy C H-*/*(P) and define 
the Galerkin solution (uzr, by) € Xy x Yy by 


2alu y, v) + (K! — Toby, v) + (Wy, v) 
= 2(to, v) + (Wg, v) + Xf, v) 
UE = Duy, Y) — (Voy, Y) = (K — Duo Y) (22) 


for all v € Xy and y € Yy. 
There holds the following convergence result. 


Theorem 2. Every Galerkin scheme (22) with approxi- 
mating finite-dimensional spaces X, C H'(Q) and Yy C 
H-U) converges with optimal order, that is, with the 
exact solution (u, $) of (21) and the Galerkin solution 


(uy, dy) Of (22), there holds the estimate 
lu — ualia + io- dy llipr 


scl mjusi inf lb- ô 
<C{ jat Iu- fig + jaf l= linr} 9 


where the constant C is independent of M, N, u, and o. 


For a slightly modified interface problem with 
~Au, +u = finQ, (24) 


the convergence of the Galerkin scheme of the FE/BE cou- 
pling follows, owing to Stephan and Wendland (1976), 
directly from the strong ellipticity of the system (21). 
Choosing v = u and y = —ọ shows that the inf—sup con- 
dition is satisfied (Costabel and Stephan, 1988b), namely, 


lul? e + lolz yor S 2a, u) + 2(u, u) 
+ (Wu, u) + (Vb, >) 


Note that (19) is just the weak formulation of the bound- 
ary value problem in Q, if A(Vu,) -n is given on I’. Now, 
a coupling can be considered as follows: solve (numeri- 
cally) one of the equations (14), (15) for du,/dn in terms 
of u, and insert the resulting expression for (0u,/dn)|p = 
A(Vu,) +n — fg in terms of x; |p into (19). Then, (19) has an 
appropriate form to which FEM can be applied. This works 
particularly well if a Green’s function G for Q, is known. 
Then, (14) becomes 2(8u,/dn) = —W u,, and inserting this 
into (19) gives 


2a(u,,v) + (Wy, v) = 2(fg, v) + (Wug, v) + 2(f, v) 

(25) 
Here, the left-hand side satisfies a Garding inequality in 
H! (9&2). Thus, by assuming uniqueness for the solution of 
the original interface problem, one obtains immediately that 
every conforming Galerkin method for (25) converges with 
optimal order. If a Green’s function for 2, is not known, 
then one introduces du,/dn as an additional unknown. 
Then, the common direct method (see Johnson and Nedelec, 
1980) takes a weak formulation of (15) on the boundary 
together with (19) and the coupling conditions (3) to form 
a system thal is discretized by approximating u, with 
FE in Q, and du,/dn with BE on T. Another indirect 
method (see Bielak and MacCamy (1983) and Hsiao 
(1992)) consists of a single-layer potential ansatz for the 
exterior problem, that is, the solution of Au, = 0 in Q, 
is looked for in the form u,= Vw with an unknown 
density y on T. In the last two methods, the resulting 
matrix is not symmetric and one does not have a Garding 
inequality except for linear scalar equations. In the case of 
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elasticity systems considered in Section 6.1, the operator of 
the double-layer potential and its adjoint are not compact 
perturbations in (14) and (15). Thus, standard arguments 
that guarantee the convergence of the Galerkin method do 
not apply (cf. Steinbach and Wendland, 2001). 

From the error estimate (23), we deduce an O(h1/2) 
convergence of the Galerkin solution for the h-version, 
in which piecewise linear FE and piecewise constant BE 
are used, since the solution (u,) of (21) belongs to 
H??-*(Q,) x H~*(P) for any £ > 0, as follows from the 
analysis in Costabel (1988a). If F = 8Q is a smooth mani- 
fold, the solution of (21) is also smooth, since the integral 
Operators V, K, K’, and W in (10) to (13) are elliptic 
pseudodifferential operators. Therefore, for smooth bound- 
aries, the rate of convergence of the Galerkin scheme (23) 
depends only on the regularity of the FE space X, and the 
BE space Yy. Our symmetric coupling method works for 
arbitrary meshes (and p-distributions). Especially, we can 
choose hy,, = hy,» where hy, denotes the mesh size of the 
FE mesh and hy, , the size of the BE mesh, that is, we can 
take as grid on the coupling boundary just the mesh points 
of the FE grid that lie on F. Thus, we generalize the results 
in Wendland (1986, 1988) and Brezzi and Johnson (1979). 

Next, we consider the hp-version of the symmetric 
FE/BE coupling on a geometric mesh. Let Q be a polyg- 
onal domain in IR? or a polyhedral domain in R°. Let 9? 
be a geometric mesh on Q, refined appropriately toward 
vertices of © in IR? and vertices and edges of Q in IR?. 
Then, Q; induces a geometric mesh T” on the bound- 
ary F = ðQ. On the FE mesh 7, we approximate u by 
(affine transformations of) tensor products of antiderivatives 
of Legendre polynomials together with continuous piece- 
wise linear functions; this gives the space S n1 (92). On the 
BE mesh F}, we approximate by elements in S"-1(T"), 
which are discontinuous functions and (affine transforma- 
tions of) tensor products of Legendre polynomials. Then, 
the above symmetric coupling procedure converges expo- 
nentially (see Heuer and Stephan, 1991; Guo and Stephan, 
1998). 


Theorem 3. Let QCR, d=2,3 be polygonal or 
polyhedral respectively. Let f, uo, ty be piecewise analytic, 
then there holds the estimate 


lu — uylla + Il = Oy lly-vecry 
cefet e), a=2 
So LehE geni, ging 


between the Galerkin solution u um E€ Xu =5 nle, by € 
Yy= Spe), and the exact solution u, > := (du/dn)|, 
of (21), where the positive constants C,b,,b, are 
independent of M = dim X y and N = dim, Ne 


Next, we comment on the structure of the Galerkin matrix 
for the symmetric FE/BE coupling procedure, which reflects 
the saddle-point character of the weak formulation. 

To be more specific, let us introduce bases 


span{vy,..., Yy} =Xy and span(,,..., Yy} = Yy 


The basis functions of X,, are supposed to be ordered such 
that 


span{v,,..., Uy} = Xy HO 


The basis functions that do not vamish on I are then 
UMo+ir+++> Uvg4mp: Let us denote the coefficients of uy, 
and $y again by uy, and oy respectively. Further, by 
uy, and uy, we denote the coefficients belonging to the 
components of wy, that are interior and not interior to Q, 
respectively. We obtain a linear system of the form 


Ung A BT 0 UMa by 

A uy, {=| B C+W KRT-I uy, | = |b 

by 0 K-I -V by bz 
(26) 


A 5 cede 
Here, the block represents a discretization of 


B 
(3 e 
the FE bilinear form 2a(., -) and corresponds to a Neumann 
problem, whereas A corresponds to a Dirichlet problem. 
Note that all the basis functions used to construct A vanish 
on P. The block C deals with the continuous basis functions 
that are nonzero on P. The block W belongs to the same 
basis functions as C, but is a discretization of the hyper- 
singular integral operator W. The third diagonal block V 
only deals with the basis functions of Yy and belongs to 
the single-layer potential V. Finally, the blocks J, K, and 
the transpose of K, KT, provide the coupling between the 
two ansatz spaces X, and Yy. For J, the bilinear form of 
the duality between H'/2(P) and H~!/2(P) is used, and 
K and K® are discretizations of the double-layer potential 
K and of its adjoint operator K’. Because of the specific 
form of A in (26), specific iterative solvers should be cho- 
sen for an efficient solution procedure (cf. Section 3) (see 
Chapter 12, this Volume). 


2.1 Convergence analysis 


In this section, we prove the existence and uniqueness of 
the weak (variational) solution of the interface problem (IP) 
in Section 2 and show the convergence of the Galerkin 
solution proving Theorem 2 above. The presented analysis 
follows Carstensen and Stephan (1995a) and uses heavily 
the strong coerciveness of the Steklov-Poincaré operator 
(for the exterior problem) and of its discrete analogue. 

Firstly, we note that the weak formulation (21) can be 
rewritten as problem (P). 
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Problem (P): Find (u, $) € H3(Q,) x H-¥/2(P) with 
B(G (4) =L yo p eH) x ae 
(27 


Here the continuous mapping B: (HQ) x H7¥7(r))? > 
IR and the linear form L: H'(Q) x HT) > RB are 
defined by 


B(( $) (3) := | A(Vu)- Vudx 


Qi 


$5 (Wal + (K = Do, vlr) 
+I VO+T- Kulp) C8 


1 
L(g) = ff ude + 70h. = Kuo) 
+a + 5 Wu vl] (29) 
for any (u, $), (v, Y) € H! (Q) x HYT). 
Note that (15) is equivalent to 
b = VTH — K) (u; — uo) (30) 


which may be used to eliminate ọ = (ðu,/ðn) in (20). This 
leads to the following equivalent formulation: 
Find u € H1(Q,) with 


Alu) (y) := 2 f A(Vu)- Vndx + (Sulp, nlp) 
Qı 


= L'(y) := af fodx + (2 + Sup, nip) 
1 


(n € H9) G1) 
with the Steklov--Poincaré operator for the exterior problem 
S:=W+ (I — K'V (I-K) 

: HT) — HPT) (32) 
which is linear, bounded, symmetric, and positive definite. 
In the case that A in (IP) is a linear mapping, the 


following result proves that the bilinear form B satisfies 
the BabuSka—Brezzi condition. 


Lemma 2. There exists a constant B > 0 such that for all 
(u, $), (v, Y) € HUR) x HYT), we have 


B- (go) lawn E IGE lom 
< B(S) G) BG) Gaa) (33) 


with 2m := b+ VU — Kulp, 2:=v+V-'U — K) 
vp € HP). 


Proof, Some calculations show 


B((§); (na) = BC) (8) 

= [wav —(AVv)) + V(u — v) de 
Q 
+ Wu- v), u=) +F (Su) u- o) 


+V- tb-t) 


ele BIR 


Since A is uniformly monotone, W is positive semidefi- 
nite, S and V are positive definite, the right-hand side is 
<li (uv \ 12 z : 
bounded below by ĉ|| (424) lgi@xx-2cr) With a suitable 
constant č. 
On the other hand, by definition of y, 8, we have with a 
constant c’ 


In — lla- S A IG loxa 
oO 


Theorem 4. The interface problem (IP) and the problem 
(P) have unique solutions. 


Proof. The operator A’ on the left-hand side in (31) maps 
H}(Q,) into its dual; it is continuous, bounded, uniformly 
monotone, and therefore bijective. This yields the existence 
of u satisfying (31). Letting ọ as in (30), we have that (u, ) 
solves problem (P). Uniqueness of the solution follows 
from Lemma 2, yielding also the unique solvability of the 
equivalent interface problem (IP). 


Next, we treat the discretization of problem (P). 

Let (H,x Hy'?:hel be a family of finite- 
dimensional subspaces of H1(Q) x H7? (T). Then, the 
coupling of FE and BE consists in the following Galerkin 
procedure. 


Definition 1 (Problem (P,)) For h€ I, find (Up p) € 
Hp X Boe such that 


BC). (a) = LG) (34) 


for all (vp Yp) € H, x HOM. 


In order to prove a discrete Babu’ka—Brezzi condition 
if A is linear, we need some notations and the positive 
definiteness of the discrete Steklov—Poincaré operator. 


Assumption 1. For any hel, let H,x H, Wie 
H!(Q) x H-/2(P), where 7 C (0,1) with O€/. le 
H; 1/2 for any h € I, where 1 denotes the constant function 
with value 1. 
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Let ip: H,  H'(Q) and jp: H; > HYP) de- 
note the canonical injections with their duals if: H1(Q)* > 
HY and jf: H/2(r) —> (H,/?)* being projections. Let 
y: H'(Q) — H?) denote the trace operator, yu = uly 
for all u € H'(Q), with the dual y*. 

Then, define 


V, = JEV Ky = i Kyin 
W, := iR Y" Wyin Kp = ity"K'i, (35) 
and, since V, is positive definite as well, 
Sa = W, +f — KOV O, — Ki): H, > H? G6) 
with J, := jfyi, and its dual If. 


A key role is played by the following coerciveness of 
the discrete version of the Steklov—Poincaré operator (see 
Carstensen and Stephan, 1995a). 


Lemma 3. There exist constants cy > 0 and hg > O such 
that for any h € I with h < hg, we have 


(Spy. My) = co ltnlrlineg forall up € B, 


Lemma 4. There exist constants By > 0 and hy > 0 such 
that for any he I with h < ho, we have that for any 


(Uns Pr), One Hy) E Hp X HS 


Bo: EA TE ; [Cas laon 67) 
< B(G) Cuan) — BCC)» Cra )) (38) 


with 2mp = Qp + Vy Up — Ky duty, 28, = Vy, + Vi Uh — 
K,)v, € H; 


Proof. The proof is quite analogous to that of Lemma 2 
dealing with the discrete operators (35) and (36). All 
calculations in the proof of Lemma 2 can be repeated with 
obvious modifications. Because of Lemma 3, the constants 
are independent of h as well so that By does not depend on 
h < ho (ho Chosen in Lemma 3). Hence, we may omit the 
details. 


Corollary 1. There exist constants cy > Oand hy > O such 
that for any he I with h < ho, the problem (P,) has a 
unique solution (up, >;,), and, if (u, $) denotes the solution 
of (P), there holds 


| ( ot) | HY(Q)x HYP) 


Sc" inf ee) eax) 


(gemar? 


Proof. The existence and uniqueness of the discrete solu- 
tions follows as in the proof of Theorem 4. Let (U,, ®,) € 
H” x H7” be the orthogonal projections onto H* x 
H,,/* of the solution (u, $) of problem (P) in H! (Q) x 
H-"2(T). From Lemma 4, we conclude with appropriate 
(Ma: 3,) € HË x HY? that 


=i Ur- 
Bo A IEE aa p EE Laencrinay 
S BAC) C )) BCG) Cazi) 
Using the Galerkin conditions and the Lipschitz continuity 
of B, with related constant L (which follows since A is 
Lipschitz continuous), we get that the right-hand side is 
bounded by 


L. EE veer y | ( ay) ees 


Un— ün 


Dividing the whole estimate by Mai leoa 


proves 


L 


eae) ree re = à Man I aztec antes 
Bo 


From this, the triangle inequality yields the assertion. O 


2.2 Adaptive FE/BE coupling 


In this section, we present a posteriori error estimates for 
the h-version of the symmetric coupling method. 

For the efficiency of FE and BE computations, it is 
of high importance to use local rather than global mesh 
refinement in order to keep the number of unknowns rea- 
sonably small. Often, in particular for nonlinear problems, 
any a priori information about the solution that would allow 
the construction of a suitably refined mesh is not available. 
Thus, one has to estimate the error a posteriori. 

For the FEM, there are some well-known error 
estimators: 


1. Inserting the computed solution into the partial differ- 
ential equation, one obtains a residual. For an explicit 
residual error indicator, a suitable norm of the resid- 
ual is computed; this also involves jump terms at the 
element boundaries. 

2. Error estimators based on hierarchical bases are often 
used in the context of multilevel methods that allow 
fast iterative solution procedures for large systems of 
equations. For an overview on the estimators, see Bank 
and Smith (1993). 

3. To obtain an approximation of the error, one may solve 
local problems on each FE or on patches of ‘FE (see 
Bank and Weiser, 1985). The right-hand side of the 
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local problems involve the residual. So these estimators 
are called implicit residual estimators. 


If 8 is the estimated error and e is the actual error, the 
ratio 5/e is called the effectivity index. The advantage of 
the third approach is that one can expect a good effectivity 
index. For linear elliptic problems with a positive definite 
variational formulation, the solution of infinite-dimensional 
local Neumann problems yields an upper bound on the 
energy norm of the error in the entire domain; there are no 
multiplicative constants in this estimate, and the effectivity 
often is not much larger than 1. In practice, the local prob- 
lems are solved approximately by using a higher polynomial 
degree or a finer mesh. In contrast, the first approach merely 
yields a bound up to a multiplicative constant that is diffi- 
cult to determine. Most often, this method is just used as a 
mesh refinement criterion. Of course, the evaluation of this 
error indicator is much cheaper than the solution of a local 
problem, Explicit residual error indicators can be carried 
over to boundary element methods; inserting the approx- 
imate solution into the integral equations, one obtains a 
residual, which has to be evaluated in a suitable norm (see 
Subsection 2.2.1). It is also possible to establish a hierarchi- 
cal basis estimator for BE (see Subsection 2.2.2). The use 
of error estimators based on local problems for the FE/BE 
coupling is given in Subsection 2.2.3 (see Chapter 4, this 
Volume, Chapter 2, Volume 2). 


2.2.1 Residual based error indicators 


In this section, we present an a posteriori error estimate 
from Carstensen and Stephan (1995a). For simplicity, we 
restrict ourselves to linear functions on triangles as FE in 
H, and to piecewise constant functions in Hy °. 


Assumption 2. Let Q be a two-dimensional domain with 
polygonal boundary F on which we consider a family 
T:= (T,:h € I) of decompositions T, = {Ay,..-, Ay} of 
Q in closed triangles A,,..., Ay such that È = UXA; 
and two different triangles are disjoint or have a side or a 
vertex in common. Let S, denote the sides, that is, 


Sa = ƏT, NƏT, : i Aj with a7, NaT, 
as the common side} 
ƏT, being the boundary of T;. Let 
Gr = {E : E € S, with E CT} 
be the set of ‘boundary sides’ and let 
S = S, \ Dr 


be the set of ‘interior sides’. 


We assume that all the angles of some A € 7, € T are 
> © for some fixed © > 0, which does not depend on A 
or Tp. 

Then define 


H, = {n, € C(Q): nala € P) for any Ae J} (39) 


H” = {n, € L” (T) : gle € Py for any E € Gp} 
(40) 
where P, denotes the polynomials with degree < j. 

For fixed 7,, let A be the piecewise constant function 
defined such that the constants h|, and h|, equal the 
element sizes diam(A) of A € J, and diam(E) of E € Sp- 

We assume that A(Vv,) € C1(A) for any A ET, €T 
and any trial function v, € H,. Finally, let f € L?(Q), 
uy € H! (T), and ty € LC). 


Let n be the exterior normal on T, and on any ele- 
ment boundary ðA, let z have a fixed orientation so that 
[A(Vu,)-n]lz € L?(E) denotes the jump of the discrete 
tractions A(Vu,) -n over the side E € S. Define 


R? = E diata)?» f if + div Au) ax (41) 


AeT, 


R= > diam(E) - | Aup) mi as (42) 


Ees? 


R; := 


vh- (+ — A(Vu,) n + 5 Wu — ulr) 


—i(K’— Den) 


(43) 


LUT) 


£ ð 
R; := 3 diam( £)! ' z — K)(ug — ulr) 


EEGh 


— Vor) (44) 


LE) 


Under the above assumptions and notations, there holds 
the following a posteriori estimate, in which (u, ò) and 
(uns Ph) solve problem (P) and (P,) respectively (see 
Carstensen and Stephan, 1995a). 


Theorem 5. There exists some constant c > 0 such that 
for any h € I with h < họ (hg from Lemma 3), we have 


Ia) laxe SO (Ry + Ry + Ry + Ry) 


Note that R,,..., R4 can be computed (at least numer- 
ically) as far as the solution (u,, $p) of problem (P,) is 
known. The proof of Theorem 5 is based on Lemma 4. A 
corresponding adaptive feedback procedure is described in 
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Carstensen and Stephan (1995a) and is extended to elastic- 
ity problems in Carstensen, Funken and Stephan (1997) and 
to interface problems with viscoplastic and plastic material 
in Carstensen, Zarrabi and Stephan (1996). A posteriori 
error estimates with lower bounds yielding efficiency are 
given in Carstensen (1996b) for uniform meshes. 


2.2.2 Adaptive FE/BE coupling with a Schur 
complement error indicator 


Recently, the use of adaptive hierarchical methods has been 
becoming increasingly popular. Using the discretization of 
the Steklov—Poincaré operator, we present for the symmet- 
ric FE/BE coupling method an a posteriori error estimate 
with ‘local’ error indicators; for an alternative method that 
uses the full coupling formulation, see Mund and Stephan 
(1999). By using stable hierarchical basis decompositions 
for FE (see Yserentant, 1986), we derive two-level subspace 
decompositions for locally refined meshes. Assuming a sat- 
uration condition to hold, as mentioned in Krebs, Maischak 
and Stephan (2001), an adaptive algorithm is formulated to 
compute the FE solution on a sequence of refined meshes 
in the interior domain and on the interface boundary. At 
the end of this subsection, we present numerical experi- 
ments that show the efficiency and reliability of the error 
indicators. 
Let p € C'(R,) satisfy the conditions 


P Sp) <p, and p spt <p; (45) 


for some global constants pg. P4, P2» P3 > 0. We consider 
the following nonlinear interface problem (NP) (cf. Gatica 
and Hsiao, 1989) in IR?: 

Problem (NP): Given the functions f: Q, > R and 
Uo, to: F > R, find u;: Q; > R, i = 1,2, and b € R such 
that 


—div(p(|Vu;) Vu) =f in Q (46a) 
—-Au,=0 in Q, (46b) 
uy — uü, =U, onl (46c) 
Ou, ðu 
Vv. —l—~—4= 

pli) = - 52 =f on P (46d) 

u(x) = blog|x|+o(1) for |x| —> 00 
(46e) 


where dv/dn is the normal derivative of v pointing from 
Q, into Q. 

By a symmetric coupling method (Costabel, 1988b), the 
problem (46) is transformed into the following variational 
problem (cf. Carstensen and Stephan, 1995a; Stephan, 
1992). 


Given feE(H(Q)Y, uweH'?), and he 
H72(L), find u € HQ) and p € H7/?2(P) such that 


alu, v) + Blu, $; v, 4) = L, Y) (47) 


for all v e H1(Q,) and y e H-/*(P), where the form 
a(-; -) is defined as 


a(u, v) := 2f e({Vul) VuVu dx 
Qy 


the bilinear form B(-; -) is defined as 


Bu, ġ; v, Y) := (Wur + (K' — Do, vr) 
-— (y, (K - Dur = Vo) 


and the linear form £(-) is defined as 
Lv, y) = Uf, v) + (2t) + Wig, vr? — (yp, (K — Iug) 


Here, (-;-) and (-;-) denote the duality pairings between 
(H(Q,)¥ and H1(Q,) and between H~¥/2(P) and 
H'/2(P) respectively. The unknowns in (46) satisfy u; =u 
and ĝðu,/ðn = $ and u, can be obtained via a representation 
formula (Costabel, 1988b). 


Lemma 5. The following problem is equivalent to (47). 
Find u € H'(Q,) such that 


au, v) + (Sup, v) = Fv) Vu e A1(Q,) (48) 


where Fv) := 2 f fodx + (2ty + Sug, Ur) 
Qi 


a(-,-) as in (47), and the Steklov—Poincaré operator for 
the exterior domain S := W + (K'— DV (K — I) is a 
continuous map from H!?(T) into H-¥/2(r). 


Firstly, we describe the coupling of the FEM and the 
BEM to compute approximations to the solution (u, >) of 
(47). For this purpose, we consider regular triangulations 
Oy of Q; and partitions yy of T. Our test and trial spaces 
are defined as 


Ty := {Vy: Q; > R; vy p.w. linear on wy, 

vy €C%(Q,)}, (49) 
ty ={by: 0 >R; Yy p.w. constant on Yy} (50) 
We assume that the mesh for the discretization of the BE 


part Yy is induced by that of the finite element part. This 
yields the following discretization of problem (47). 
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Find (uy, Qy) € Ty X ty such that 


2 f p(|Vu yl) Veg Vv dx + Buy, bys v, Y) = Lv, Y) 
Qı 


(51) 
for all (v, Y) € Ty X ty- 
The application of Newton’s method to (51) yields a 
sequence of linear systems to be solved. Given an initial 
guess (u9, 6%), we seek to find 


u9, 8) = (Ue or) + (dg. dp) C=1,2,-.) 
such that 
a,so (dy, v) + BR, 843 v, 4) 
= Lv, p) auk, v) BUG, pg np) 6 


for all (v, Y) € Ty X Ty with aG,:), BCs; 5), and 
£(-,-) as in (47). The bilinear form a,,(-, -) is defined by 


a, (u, v) := af (6(Vw) Vu)Vu dx (53) 
Qı 


and # € IR? is the Jacobian of x —> p(|x|)x, that is, 


x 


a "x 
6 = pix oxo + P(x) iz (x ER’) 

From the assumptions on p in (45), it follows that there 
exist constants v, ų > 0 such that 


a, (4, v) < v lulla llaen 
and 


W lutra <a,lu,u) (54) 


for all w, u, v € H (9). 


Since p is sufficiently smooth, the energy functional 
of (47) is strictly convex, and hence, Newton’s method 
converges locally. 

For the implementation of (52), we define the piecewise 
linear basis functions 
bi) = 8; ; (LsjSnn Mtl Sj snp) 
where v; € 2,\P (1 <i < npn) are the inner nodes of wy 
and y; € F (na +1 <i <ny := dim Ty) are the boundary 

nodes of wy counted along the closed curve F. 

On the boundary, the following basis of ty is intro- 
duced. Let p; € Yy be the BE induced by the nodes 
Vaigbir Vag ti-bl al = i = nl In, = dim ty) and Mn, by 


the nodes v., Va: With each pp, we associate the basis 


function 


Paes fh if x en; 
pits) = fa if xeP\p; 


With the basis functions b; and ĝ;, (52) yields a linear 
system, which may be solved with the hybrid modified con- 
jugate residual (HMCR) scheme together with efficient pre- 
conditioners (Hale et al.; Heuer, Maischak and Stephan, 
1999; Mund and Stephan, 1997). 

In Mund and Stephan (1999), an adaptive algorithm is 
given on the basis of a posteriori error estimates of the 
solution (4, ,) of (51). Here, we apply a Schur com- 
plement method based on a Galerkin discretization of the 
variational formulation (48), eliminating the unknown vec- 
tor ». In this way, we also obtain a discretization of the 
Steklov—Poincaré operator, which will be used to develop 
an a posteriori error indicator that needs only a refinement 
of the mesh defining Ty and does not need a finer dis- 
cretization as Ty. 

Next, we introduce hierarchical two-level decompositions 
for the finite element space T, on w, (cf. (49)), where we 
get œ, by the refinement shown in Figure 1. 

These decompositions will be used to derive an a poste- 
riori error estimate for the Galerkin solution to (47), which 
is obtained by applying a Schur complement to (51). 

We take the hierarchical two-level subspace 
decomposition 


T,:=Ty@L,, L,:= 1,88: OT, 
with T; := span{b,}, where 6, denotes the piecewise linear 
basis functions in the new n node points v; of the fine 


grid (Yserentant, 1986; Mund and Stephan, 1999). Let 
Py : T, > Ty, P; : Tp, > T, be the Galerkin projections 


with respect to the bilinear form b(-, :), which is defined as 
btu, v) := [ (VuVov + uv) dx (55) 
Qı 


For all u € T}, we define Py and P, by 


b(Pyu, v) = blu, v) Wwe Ty 
b(P,u,v) = blu, v) Wwe T; 


ENEAN 


s 


Figure 1. Refinement of è € wy. The longest edge of è is denoted. 
by s. The new nodes are the midpoints of the edges of ò. 
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Now, we introduce the approximate Steklov—Poincaré 
operator on fine-mesh functions 


Š, = Wa + (Kia UVa Ki Inn) 66) 
where, for u, v € T, and p, Y € ty, 
(Wpro ur) = (Wur: Up) 
(Ki — Ia 4 Vir) = (K Dur. Yir) 
(Veolp. Yir) = (Vẹlr, Yir?) 
(Ki, — Thole. vr) = (K'— Dooly, vr) 


Furthermore, we consider the discrete Steklov—Poincaré 
operator 


Sy = Wy + (Khu ~ Twa Kan -Iaa 67) 


on coarse-mesh functions, in which the operators are 
defined as above by substituting Ty for Tp.  _ 

With the discrete Steklov—Poincaré operators S, and Sy, 
we formulate discrete problems to (48): 


Find uy € Ty such that 
aluy, V) + (Sytyr, Yr) = Fy) W € Ty (58) 


and 


find %, € T, such that 
alii, V) + (Špan Mr) = Fo) Vo eT, (69) 


where F,,(-) and F, O are obtained by substituting Sy for 
S in F of (48) and S, respectively. 

For our analysis, to derive an a posteriori error estimate 
(Theorem 6), we have to make the following saturation 
assumption. 


Assumption 3. Letu, uy, it, be defined as in (48), (58), 
and (59). There exists a constant k € (0, 1) independent of 
H, h such that 


fu — ty lang SKY — Alley 


The following a posteriori error estimate is proved in 
Krebs, Maischak and Stephan (2001). 


Theorem 6. Assume that the above saturation assumption 
holds. Let Ty C T, C T, C -+ be a sequence of hierarchical 
subspaces, where Ty is an initial FEM space (cf. (49)). The 
refinement of all triangles defining T, according to Figure 1 
gives us T, y Let k denote the number of the refinement level 


and u, the corresponding Galerkin solution of (58) and u 
the exact solution of (48), then there are constants ,,C, > 0, 
ko € No such that for all k = ko, 


A 1/2 if 1/2 
u( om) < lle — ell yay = (Zet) (60) 
i=l 


i=l 
where the local error indicators 


__ 2990; r) + 9rGi)! 


BES 


(61) 
le rllz 


are obtained via the basis functions b;, € T,,,\T, by a 
domain part 


volbi) — f J bip dx — Í pCi Vuri) Vu, Vb; g dx 
e ; (62) 
and a boundary part ý 


Sp (bix) = (2% + Seto biar) — (Shet bir) (63) 


with Sag defined as in (56) with respect to Tp x Ty instead 
of T, Ty- 


Next, we list the numerical experiment from Krebs, 
Maischak and Stephan (2001) for (NP) with ọ= 1 and 
choose Q; to be the L-shaped domain with comers at 
(0,0), (0, 1/4), (-1/4, 1/4), (-1/4, -1/4), (1/4, -1/4), 
(1/4, 0). The exact solution of the model problem (NP) is 
given by 


2 x 
2/3 o; 
u(r, a) =r” sin(a- 5) 


17 1\? 
uz (x1, x2) = log (« + 5) + (« + i) (64) 


The functions up, tp f are chosen to yield the exact 
solution. The quantities in Table 1 are given as follows: 
With k we denote the refinement level, with n, the total 
number of unknowns, and with N, the total number of 
triangles defining T,- The error E, is defined as 


Ey i= |lu — gll, 


The global error indicator n, is defined by 


Me 12 
Tk = (= vs) 
i=l 


2 2 51⁄2 
nik ‘= Can E Gh +7, e) 


G=1,...,N) 
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Table 1. Results for adaptive algorithm based on Theorem 6 for 
(NP) with u1, 42 from (64), ¢ = 0.15. 


Lom dim-dim ŒE Ne  Nk/Ek Kk Qk 
ko u 

0 37 21 16 0.10608 0.13067 1.232 — — 

1 55 37 18 0.07596 0.08283 1.090 0.716 0.842 
2 78 58 20 0.05511 0.06495 1.179 0.725 0.919 
3 109 85 24 0.04510 0.05596 1.241 0.818 0.599 
4 163 129 34 0.03626 0.04373 1.206 0.804 0.542 
5 454 396 58 0.02063 0.02419 1.172 0.569 0.550 
6 677 595 82 0.01654 0.01936 1.171 0.802 0.554 
7 1972 1840 132 0.01008 0.01108 1.100 0.609 0.464 


Here i,, iz, i; denote the three edges and the corresponding 
new base functions for every element of the old mesh. 
The values of the quotient n,/E,, the efficiency index, 
indicate the efficiency of the error indicator n, and confirm 
Theorem 6. The quantity 


iS llu — urlio 


E u = tps alli.e, 

estimates the saturation constant K. Since K, is bounded by 
a constant less than 1, the saturation condition is satisfied 
for the sequence of meshes that is generated by our adaptive 
algorithm. With the above value for the steering parameter 
ý, we obtain 


Ky © 1.457% ~ 0.83 


which, in view of Table 1, turns out to be an upper bound 
for the x,. The experimental convergences rates œp are 
given by 


_ log(E,/ Ey_1) 


ie log (y_1/7,) 


From Table 1, we see that a, approaches 1/2, which is 
the convergence rate in case of a smooth solution, thus 
showing the quality of the adaptive algorithm. The above 
hierarchical method is easily implemented, since for the 
computation of the error indicators one can use the same 
routine as for the computation of the entries of the Galerkin 
matrix. 


2.2.3 The implicit residual error estimator 


Error estimators based on local problems cannot directly 
be carried over to BE because of the nonlocal nature 
of the integral operators. In Brink and Stephan (1999), 
we combine the technique of solving local problems in 
the FEM domain with an explicit residual error estimator 
for one equation of the BE formulation. For each FE, a 


Neumann problem is solved. As in the pure FEM case, 
the boundary data of each local problem has to satisfy an 
equilibration condition that ensures solvability. 

For simplicity, let us consider the interface problem (IP) 
with d =3 and f = ug = 0 and assume that the domain 
Q is partitioned as = U{T: T € 7,}. The elements T 
typically are open tetrahedra. For T Æ T’, T NT’ is either 
empty or a common vertex or edge or face. 

Let (un, Pp) € H! (Q) x H-/(P) denote a computed 
approximation of the solution of (21). (up, $,,) may contain 
errors because of an approximate solution of an appropriate 
discrete system. In view of (14), 


2, := (U — Kb, — Wu, 


is an approximation of the normal derivative on F. 

We need a continuous symmetric bilinear form @(., -) on 
H'(Q). For the time being, â need not be specified further. 
The restriction to an element T is denoted by âp and is 
such that 


aw, v) = È âr(wlr, vir) Vw, ue H1(Q) (65) 


Tet, 


For every element, we set 
Wr := {u|p: v € H1(Q)} (66) 
The kernel of âp is 
Zp := {v € Wr : âp(w, v) =0 Yw € Wz} (67) 


The bilinear form â is required to be H!(Q)- elliptic, that 
is, there exists a positive constant œ such that 
Av, v) Sallvliig vue B®) (68) 

Similarly, for all elements, ây is required to be Wr/Zr- 
elliptic. 

For the error estimator, the following local problem with 
Neumann boundary conditions is solved for each element. 
Find wp € Wr such that 


Gz (wy, v) = lr) vue Wr (69) 


with the linear functional 


tr) := 2 f fuar - 2 [Au -Vudx +2 f qrv ds 
T T ar 
(70) 
The functions qy € L?(3T) are prescribed normal deriva- 
tives. In Brink and Stephan (1999), we comment on how 
to obtain suitable qy. Hence, we only assume that the fol- 
lowing conditions are satisfied: 
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1. The boundary conditions on Fy are 
Ir =8 on OT NT y (71) 
2. The boundary conditions on the coupling interface are 
dr = >, on aT NT (72) 


3. On all element boundaries in the interior of 2, there 
holds 


dr = -qr oT (TAT) 3) 
4. For all elements, there holds the equilibration condition 
lro) =0 Wu € Zr (14) 


This condition is obviously necessary for the solv- 
ability of (69). For completeness, we prove the follow- 
ing a posteriori error estimate from Brink and Stephan 
(1999). 

Theorem 7. Let bilinear forms â and âp be H}(Q)-elliptic 


and W,/Z,-elliptic respectively. Assume that the condi- 
tions (71) to (74) hold. Then 


llui — urlio + ld = oy l_aor 
1/2 
< CIVpr +U- Kuhar +C} >. âp lwr, wz) 
Tet, 
where wy are the solutions of the local problems (69). 
Proof. Owing to the ellipticity of âp, the local problems 


(69) are solvable. We define z € H'(Q) to be the unique 
solution of 


a(z,v) = 2 [ tacvm) — A(Vu,)} 
Q 
- Vvdx — 2(v, b — by) Yve H1(Q) (75) 
Note that 
a, v) =2 | {Au — A(Wu,)) Vode 
Q 
+(v, Wu, —u,) — (I — K’) ($ - >))) 
= ah A(Vu,) - Vudx 
+2(v, $a) +f fvdx (76) 
Q 


by (21). From the uniform monotonicity of A, see (6), 
and the positivity of the integral operators V and W, we 


conclude, 


a{llu, — usli o + lid - ilar} 
< 2 f {A(Vu,) ~ A(Vu,)} - Vie, — up) dx 
Q 
+ (uy — up, WU, — up) + (> — br Vib — ),)) 
= af {A(Vu,) — A(Vu,)} - Vu, — u) dx 
Q ; 


+ (uy —Up, Wu, — uy) — (u, — up, (I — K’) $ — 4)) 
+ (> — br Vb — oh) + ($ — pr C — K) @, — u,)) 


= A(Z, u; — up) + (> —o,,-Vo, — (3: - K) u) 


where we used integral equation (15) for the exact solution. 
Exploiting the continuity of G(-, -) and (-, -), we obtain 


lui = urlio + lb- dy ll_apr 
= Cillzlly,0 + Vo, +- Kulli zr} (77) 


Because of (76), solving (75) corresponds to minimizing 


(v) := zâl, v) +f A(Vu,) 


- Vu dx — 2(v, Pp) -2f fudx (78) 
Q 


` over H? (Q), and 


, T, 
velit (v) = Oz) = -36 z) (79) 

Let us define 
Op(v) := jâr (v, v) — lrt) (80) 


on Wy. Then, by the trace theorem and (73), 


pu) = Se ro) vu € H1(Q) 


TET, 


and 


inf (v) > inf ® 
veH}(Q) ) sedi enor ee rv) 
= >> inf ©,() 
Teh veWr 
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The elementwise minimizers on the right-hand side are the 
solutions of local problems (69) and thus (79), 


La di. 
-zâ z) = >) sap (wr, wr) (81) 
2 2 
Teh, 
Therefore, 
izlo < > ap(wy, wr) 
Teh, 
Combining this with (77) yields the assertion. o 


3 FAST SOLVERS FOR THE hp-VERSION 
OF FE/BE COUPLING 


In this section, we consider preconditioning techniques for 
the symmetric FE and BE coupling and apply the MINRES 
as the iterative solver; see Heuer, Maischak and Stephan 
(1999), Wathen and Stephan (1998) and see Mund and 
Stephan (1997), where its stable formulation, the HMCR 
is considered. For interface problems with second-order 
differential operators, this symmetric coupling procedure 
leads, as described above, to symmetric, indefinite linear 
systems (26), in which the condition numbers grow at 
least like O(h~*p*) (see Maitre and Pourquier, 1996). In 
Heuer and Stephan (1998) and Mund and Stephan (1998), 
the GMRES is analyzed, which applies to nonsymmetric 
problems but needs more storage and is therefore less effi- 
cient. A rather general preconditioner for BEM equations 
is formulated in Steinbach and Wendland (1997) (see also 
Steinbach and Wendland, 1998). 

For brevity, we consider the interface problem in R? for 
a polygonal domain Q and its complement 2, = IR? \ 2: 


-Au [=f inQ, 
=Au, =0 in Q, 
u =u, +u onf = 3X, 
D n of (82) 
subject to the decay condition 
u(x) = blog|x] +0(1) as |x| 00 (83) 


for a constant b. Here, f €e H712), uy € H/2(P), and 
to € H-1⁄{T)} are given functions. We assume that the 
interface T has conformal radius different from 1 such that 
the single-layer potential is injective. This can be achieved 
by an appropriate scaling of 94. 


The proposed iterative method also applies to three- 
dimensional problems. Even an extension to nonlinear 
problems (see Carstensen and Stephan, 1995b) is straight- 
forward because the use of the Newton—Raphson iteration 
reduces the task to solving a system of linear equations at 
each Newton step, which can be done by the same strategy, 
as discussed in this section. Moreover, the MINRES algo- 
rithm works for problems of linear and nonlinear elasticity. 

In Heuer and Stephan (1998), we consider, instead of the 
Laplace equation in Q,, the equation 


-Au +ku =f in Q, 


with k > 0, which yields a positive definite discretization 
of the corresponding Neumann problem. This was used 
in Heuer and Stephan (1998) to analyze the convergence 
of the GMRES method. In contrast, here we present the 
results from Hever, Maischak and Stephan (1999) and show 
the convergence of the minimum residual method for an 
interface problem with positive semidefinite operator in & 
(of course, the results of this section also hold for the 
operator —A + k?). 

We approximate the solution of the interface problem 
by the hp-version with quasiuniform meshes of the coupled 
FEM and BEM. This is based on the variational formulation 
(21), which is equivalent to (82) and (83) and is uniquely 
solvable because of Theorem 1 and Theorem 4. 

Let us classify the basis functions that will be used in the 
Galerkin scheme (22). To this end, we introduce a uniform 
mesh of rectangles on &, 


and use on F the mesh that is given by the restriction of 
Q, onto I, 


However, we note that the two meshes 9, and F, need not 
be related, that is, 9, and T, can be chosen independently. 

For our model problem, we consider polygonal domains 
that can be discretized by rectangular meshes. However, we 
note that triangular meshes can also be handled similarly 
by applying the decompositions proposed in Babuška et al. 
(1991). We note that the restriction to rectangular FE 
meshes does not influence the BE mesh, which is in either 
case the union of straight line pieces. 

Let {S;; j =1,..-, Jedges} denote the set of edges of the 
mesh Q,,. We assume that the basis functions can be divided 
into the following four sets: 
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The set X, of the nodal functions. For each node of the 
mesh Q,, there is a function that has the value 1 at the 
node and is zero at the remaining nodes. 

The sets Xs, of the edge functions. For each edge 5; of the 
mesh Qy, there are functions vanishing at all other edges 
and which are nonzero only at the elements adjacent to S; 
(and on S;). 

The sets Xo, of the interior functions. For each element 
Q, of the mesh @,,, there are functions being nonzero only 
in the interior of 2;. 

The sets of BE functions. For each element I; of the BE 
mesh I’,, there are functions whose supporls are contained 
in T}. Note that the BE functions need not be continuous 
since Yy C H7"? T). 


3.1 Preconditioned minimum residual method 


In this section, we are concerned with the iterative solution 
of the linear system (26), in short form Ax =b. The 
coefficient matrix A is symmetric and indefinite. It is 
a discrete representation of a saddle-point problem. As 
an iterative solver, we use the preconditioned minimum 
residual method (see Ashby, Manteuffel and Saylor, 1990; 
Wathen, Fischer and Silvester, 1995). 

The minimum residual method belongs to the family 
of Krylov subspace methods. Stable formulations of this 
method for symmetric and indefinite systems are given 
in Paige and Saunders (1975) and Chandra, Hisenstat and 
Schultz (1977). If we denote the iterates by xp k= 
0,1,..., there holds 


lib — Axl = lb — Axle 


min 
xExo+Ky(A,ro) 


where K,(A, ro) = span{ro, Aro, +--> Ak-1y)} denotes the 
Krylov subspace and rp is the initial residual ry = b — Axo 
For a symmetric, positive definite preconditioner M, this 
relation becomes 


1b — Axima = lb — Axli m- 


min 
xExo+K( MLA, Mro) 


where [[zl|3,-1 = z?M7!z (see Wathen and Silvester, 1993; 
Wathen, Fischer and Silvester, 1995). 
Owing to Lebedev (1969), for the residuals r,, there holds 


Welle \\* De 
(Eie) <2" NTF Jbejad (8% 


when the set of eigenvalues E of M'A is of the 
form E cC {[—a, —b]U [c,d] with —a < —b<0<c<d 
andb-—a=d-—c. 


Therefore, the numbers of iterations of the preconditioned 
minimum residual method, which are required to solve (26) 
up to a given accuracy, are bounded by O(./be/ad)~. 

In Wathen, Fischer and Silvester (1995), it is shown that 
the above result also holds for nonsymmetric inclusion sets 
of the form i 


[—a, —bN~*] U [c N7, d] 


1/k 
AAW <1- N2 [be 
IIrolle ad 


3.2 Preconditioners 


yielding 


In the following section, we present preconditioners for (26) 
and give inclusion sets E for the eigenvalues of the precon- 
ditioned system matrices; see Heuer, Maischak and Stephan 
(1999) for the proofs of Theorem 8, Theorem 9, Theo- 
rem 10, and Theorem 11. The various preconditioners are 
3-block and 2-block preconditioners or additive Schwarz 
preconditioners based on exact inversion of subblocks or 
on partially diagonal scaling. Either these preconditioned 
system matrices have bounded eigenvalues (for the 2- 
block method), in which the bounds do not depend on h 
or p or E C [—a, —b] U[e, d] (for the additive Schwarz 
method and the partially diagonal scaling), where a, d 
are independent of h and p, whereas b, c are indepen- 
dent of h but behave like O(p~*(1 + log p)7*). By (84), 
the numbers of iterations of the minimum residual method 
remain bounded in the first situation, whereas the itera- 
tion numbers increase like O(p"(1 + log p)°) in the latter 
cases. 

The Galerkin matrix (26) belonging to the symmetric 
coupling method consists of a FE block, which is the 
discretization of an interior Neumann problem and of a 
BE block having the discretized hypersingular and weakly 
singular operators as terms on the diagonal. 

For the analysis of preconditioners, one requires spectral 
estimates for the individual submatrices in (26). Denoting 
by A(Q) the eigenvalue spectrum of a square matrix Q, 
there holds with suitable constants c,,...,Cg, 


ACA) C [e,h?p~*, ca] 
A(C) C [e3,€4], ¢3 29 
A(W) c [0, cel 

ACV) C [ch p, cgh] 


Considering separately the FE functions on the interface 
boundary and in the interior domain and taking the BE 
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functions that discretize the weakly singular operator, we 
have a splitting of the ansatz space into subspaces. They 
induce a 3-block decomposition of the Galerkin matrix, 
which will be the 3-block preconditioner. By this decom- 
position, the strong coupling of edge and interior functions 
is neglected. Therefore, this 3-block splitting allows only 
for suboptimal preconditioners. Already in case of exactly 
inverting the blocks, one gets O(h~3/4 p'/) iteration num- 
bers. 
In detail, we employ a preconditioner of the form 


Ao 0 
M,=1{0 C 0 (85) 
00 V 


where A, Č, and V are spectrally equivalent matrices to 
A, C, and V respectively, that is, with suitable constants 
Co, +++ Cyg» there holds 


ACA“? Aa?) C [co Cra] 
A(ET?CE*) © feqy, Cy] 
AV PV VAN) C [eyg Cial 


_ For the symmetrically preconditioned matrix, denoted by 
A, = M7” AM3’, we have the following result. 


Theorem 8. There exist positive constants a, b, c, and d, 
which are independent of h and p such that there holds 


A(A,) C [-a, —b] U [chp™, d] 


Furthermore, the iteration numbers for the 3-block precon- 
ditioned minimum residual method grow like O(h~** p3/), 


Considering the Neumann block as a whole, that is, by 
taking together finite element functions on the interface and 
in the interior, we obtain a 2-block Jacobi method, which 
has bounded iteration numbers for exact inversion of the 
two blocks and therefore allows for almost optimal 2-block 
preconditioners. 

In order to introduce the 2-block preconditioner, we use 
the following 2 x 2-block representation for A: 


Ay +W K" A BT 
A= N es n 
( K 3 where Áy = ( ) 


Ay is a FE discretization of the homogeneous Neumann 
problem for the Laplacian (the subscript N in Ay refers to 
Neumann). 


Our preconditioning matrix is 


Mz = (7 $) (86) 


where A y is spectrally equivalent to Ay + W+M and V 

is spectrally equivalent to V. Here, M is an additional mass 

matrix, which is added to make Ay + W positive definite. 
Then, the preconditioned matrix in 2 x 2-block form is 


K 


z pt 
A, = M7? AM7”? = (3 £) (87) 
with 
isiat mä, P=0 yý 
K = ABP KTV? 


Theorem 9. There exist positive constants a, b, c, and d, 
which are independent of h and p such that there holds 


A(A,) C [-a, —b] U [c, d] 


Furthermore, the number of iterations of the 2-block pre- 
conditioned minimum residual method is bounded indepen- 
dently of h and p. 


Remark 2. For V = V, all eigenvalues i, are 1, yielding 
b = 1 in Theorem 9. 


The additive Schwarz preconditioner extends the 2-block 
method by replacing the main blocks by block diagonal 
matrices. Here we proceed as follows. First, we construct 
discrete harmonic functions by applying the Schur com- 
plement method for the FE block of the Galerkin matrix. 
Then, for the FE part, we decompose the test and trial func- 
tions in nodal, edge, and interior functions. This amounts 
for the finite element block to a block Jacobi (Additive 
Schwarz) preconditioner. We split the BE block belong- 
ing to the weakly singular integral operator with respect 
to unknowns belonging to a coarse grid space consisting 
of piecewise constant functions, and belonging to indi- 
vidual subspaces according to the BE, consisting of all 
polynomials up to degree p without the constants. Our 
second preconditioner is obtained by further splitting the 
subspaces of edge functions (for both the FE and the BE) 
into one-dimensional subspaces according to the edge-basis 
functions. For the 2-block method, we obtain in this way 
two different preconditioned linear systems, which need 
respectively O(log? p) and O(p log” p) minimum residual 
iterations to be solved up to a given accuracy. We note that 
we are dealing with direct sum decompositions only. Hav- 
ing ensured the edge functions to be discrete harmonic, all 
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the arising local problems are independent of each other. 
Therefore, the procedures are capable of being parallelized 
without special modifications. Also, the Schur complement 
procedure can be parallelized on the elements’ level (cf. 
Babuška et al., 1991). 

In the notation of (86), we take matrices Ay and V, 
which are spectrally equivalent to Ay +W + M and V 
respectively. However, the respective equivalence constants 
will depend on p, but not on h. Since we define the blocks 
by decomposing the subspaces X,, and Yy, this method 
is referred to as an additive Schwarz method. The decom- 
position of the FE space Xy is given in Babuška et al. 
(1991) and the decomposition of the BE space has been 
proposed in Tran and Stephan (2000). This decomposition 
of the ansatz spaces into subspaces of nodal functions, edge 
functions for each edge, interior functions for each FE, and 
into subspaces for each BE is as follows: 


Xy =X U Xg U U Xg Up, UU Xa, (88) 


rs 


and 


Yy = %9 U Yp, U U Yn, (89) 


Here, the space X, is the space of the nodal functions, X 5 
is the space of the edge functions related to the edge Sj 
and Xo, is spanned by the interior functions on the element 
Q,. For the BE functions, we assume that Y} consists of the 
piecewise constant functions on the mesh T,, and Yp, is 
spanned by the Legendre polynomials }, i = 1,..., p ~ 1, 
mapped onto the element T. 

The preconditioner that belongs to the decompositions 
(88) and (89) is denoted by M 4sm. It is the block diagonal 
matrix of A, in which the blocks belonging to the subspaces 
in the decompositions are taken. In accordance to the 2- 
block method, we denote the finite element and BE parts 
of the additive Schwarz preconditioner Mysjy by Ay = 
Aasm and V = Vasy- 

The following theorem shows the impact of the additive 
Schwarz preconditioner on the spectrum of the coupling 
matrix and together with (84) leads to a growth for the 
iteration numbers like O (log? p). 


Theorem 10. (i) There holds 
A(AZim (Ayn + M)) UAV aay V) C [oC + log p)-?, C] 


for constants c,C > 0, which are independent of h and p. 
(ii) Let the set of nodal functions of Xy be spanned by 
the standard piecewise bilinear functions and assume the 
edge functions to be discrete harmonic, that is, for i = 
1, ..., Jeges and j = 1,..., Jy 


atu,v)=0 forall weXs,ve Xoj 


where a(u, v) = fo Vu - Vv. Then, there exist positive con- 
stants a, b, c, and d, which are independent of h and p such 
that there holds 


A(M hy A) C [~a, —b(1 + log p)~?] 
U lel + log p)~?, d] 


A preconditioner based on partially diagonal scaling is 
obtained by further refining the subspace decompositions. 
We take the block diagonal preconditioner, which consists 
of the blocks belonging separately to the piecewise bilinear 
functions, the interior functions for individual elements, 
and the piecewise constant functions. For the remaining 
functions, we simply take the diagonal of the stiffness 
matrix A. This method combines the decompositions of 
Xy proposed in Babuška et al. (1991) and of Y,, proposed 
in Heuer, Stephan and Tran (1998). More precisely, we take 
the decompositions 


Tedges p 
Xy=X,U U UXs,4 U Xa UU Xay (90) 
j=l q=2 
and 
Jy pri 
Yyv=%U UU Yr (91) 
j=1q=1 
The subspaces X,, Xo, G =1,..., Jy), and Yp are as 
before, whereas Xg q (j= 1,..-, Jedges) and Yp; (j = 


1,..., Jy) consist of individual edge functions (degree > 
2) and BE functions (degree > 1) respectively. 

In accordance with M454, We define the preconditioner 
Maag Which is the block diagonal matrix consisting of the 
blocks of A belonging to the subspaces in (90) and (91). In 
the notation of the 2-block method, we have A mM = Aging 
and V = Vaag” 


Theorem 11. (i) There holds 


A(AgL (Ay + M)) C [ep71 + log p), C] 


Aang V) C [ep7 (1 + log p)7?, C] 


for constants c, C > 0, which are independent of h and p. 
(ii) Let the set of nodal functions of X y be spanned by the 
Standard piecewise bilinear functions and assume the edge 
junctions to be discrete harmonic. ; 

Then, there exist positive constants a, b, c, and d, which 
are independent of h and p such that there holds 


MMgagA) C[~a, —bp4(1 + log p)~?] 
U[ep71(1 + log p)~?, d] 


With this preconditioner, we obtain for the iteration numbers 
of the minimum residual method a growth like O(p log? p). 


3.3 Implementation issues and numerical results 


For the spaces Yy, we use discontinuous piecewise Leg- 
endre polynomials on the decomposition of I’, and for the 
“spaces Xy, we use tensor products of antiderivatives of 
the Legendre polynomials on each of the rectangles. The 
antiderivatives of the Legendre polynomials with a degree 
of at least 1 are automatically globally continuous as they 
vanish at the endpoints of their supports. The piecewise lin- 
ear FE functions have to be assembled such that they are 
continuous on Q. To assemble a finite-dimensional sub- 
space Xj x Yy for the coupling method, we take a degree 
p and construct X,, by assembling for all rectangles the 
“tensor products f; x f; of polynomials of degrees i and j 
“respectively, up to max{i, j} = p. For Yy, we use on all 
BE all the Legendre polynomials up to degree p — 1. This 
discretization yields a linear system of the form (26). Here, 
` Uma ate the unknowns with respect to the functions of X y 
nterior to Q, wy, ate the unknowns with respect to the 
-functions of X,, on the boundary, and py represents the 
“unknowns belonging to Yy. 

The implementation of the A, B, C, I-blocks is done ana- 

ytically. The bilinear forms involving integral operators, 
~ that is, represented by the blocks W, V, K, K T are rewrit- 
en in such a way that the inner integration can be carried 
out again analytically, The outer integration is performed 
numerically by a Gaussian quadrature. For more details, see 
: Ervin, Heuer and Stephan (1993) and Maischak, Stephan 
“and Tran (2000). 
We solve the linear system (26) via the minimum residual 
“method, in which we consider the un-preconditioned ver- 
sion and the preconditioners M3, Mz, M asm and Maiag 
The theoretical results for the preconditioners M 4sm and 
Maing require discrete harmonic edge functions, that is, they 
“have to be orthogonal with respect to the H!-inner product 
¿tọ the basis functions, which are interior to the elements. 
i This is fulfilled by performing a Schur complement step 
“with respect to the interior basis functions, resulting in a 
basis transformation of the edge functions. For performing 
the Schur complement step, the inversions of the blocks of 
the interior functions are done directly, as it is done for all 
the blocks of the preconditioners. For practical applications, 
these direct inversions may be replaced by indirect solvers. 
"The action of performing the Schur complements is a local 
Operation, which can be parallelized on the elements’ level 
and is therefore not a very time-consuming task. On the 
other hand, it is also possible to choose edge-basis func- 
tions, which are a priori discrete harmonic (see Heuer and 
Stephan, 2001). 
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In Heuer, Maischak and Stephan (1999), numerical 
experiments are presented for the interface problem (82) 
and (83) with the L-shaped domain Q, with vertices 
(0,0), (0, 1/2), (-1/2, 1/2), (-1/2, -1/2), (1/2, 1/2), 
and (1/2, 0) and given data up and ty chosen such that 


uy (x, y) = S?) for z= x + iy 
and 


u(x, y) = log |(x, y) + (0.3, 0.3)] 


The numerical experiments listed in the paper by Heuer, 
Maischak and Stephan (1999) underline the above results on 
the spectral behavior of the Galerkin matrix and the various 
preconditioned versions. Here, we just list in Table 2 (from 
Heuer, Maischak and Stephan, 1999) the numbers of itera- 
tions of the minimum residual method that are required to 
reduce the initial residual by a factor of 10~?, One observes 
that for the un-preconditioned linear system, the numbers 
of iterations increase rather fast, making the system almost 
intractable for large dimensions of the ansatz spaces Xj 
and Yy. On the other hand, the 2-block preconditioner 
keeps the numbers of iterations bounded at a rather mod- 
erate value. The 3-block preconditioner, which is almost 
as expensive as the 2-block method (since in both cases at 
least the block belonging to the interior FE functions needs 
to be inverted), results in increasing iteration numbers that 
are comparable with the numbers necessary for the additive 
Schwarz preconditioner. Here, in all cases, the various sub- 
blocks are inverted exactly. As expected, the numbers of 
iterations that are necessary for the partially diagonal scal- 
ing are larger than all of the other preconditioners. On the 
other hand, this method is the cheapest one and, in com- 
parison with the un-preconditioned method, it reduces the 
numbers of iterations substantially. 


Table 2. Numbers of MINRES iterations required to reduce the 
initial residual by a factor of 10-3. 


Yh p M4+N A MZA MZA Milby A MgA 


2 2 37 36 21 il 30 30 
2 4 97 138 34 12 47 52 
2 6 181 285 40 13 54 67 
2 8 289 536 45 13 62 q 
2 10 421 892 50 13 66 94 
2 12 577 1374 9S 13 74 111 
2 14 757 2328 60 13 82 135 
2 16 961 >9999 79 17 102 197 


4 1 37 iy 3s 10 

8&8 1l 97 38 23 13 
16 1 289 67 32 13 
32 1 961 130 45 14 
64 1 3457 243 60 15 
128 1 13057 476 75 15 
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4 LEAST SQUARES FE/BE COUPLING 
METHOD 


We introduce a least squares FE/BE coupling formulation 
for the numerical solution of second-order linear transmis- 
sion problems in two and three dimensions, which allow 
jumps on the interface, (see Maischak and Stephan). In a 
bounded domain, the second-order partial differential equa- 
tion is rewritten as a first-order system; the part of the 
transmission problem that corresponds to the unbounded 
exterior domain is reformulated by means of boundary inte- 
gral equations on the interface. The least squares functional 
is given in terms of Sobolev norms of the orders —1 and 
1/2. In case of the h-version of the FE/BE coupling, these 
norms are computed by using multilevel preconditioners for 
a second-order elliptic problem in a bounded domain Q, 
and for the weakly singular integral operator of the single- 
layer potential on its boundary 082. We use both MG and 
BPX algorithms as preconditioners, and the preconditioned 
system has bounded or mildly growing condition number. 
These preconditioners can be used to accelerate the compu- 
tation of the solution of the full discrete least squares system. 
by a preconditioned conjugate gradient method. Thus, this 
least squares coupling approach gives a fast and robust solu- 
tion procedure. The given approach should be applicable to 
more general interface problems from elasticity and electro- 
magnetics. Numerical experiments confirm our theoretical 
results (cf. Table 3 and Figure 2): 

Here we consider again the interface problem (IP), that 
is, (1) to (5), for A(Vu,) = aV, where a;; € L® (Q) such 
that there exists a a > 0 with 


alzi? < z'a(x)z Yz eR? and for almost all x € Q 


Introducing the fiux variable 6 := aVu, and the new 
unknown o := (aVu,)-n, we note that the unknown 9 
belongs to H(div; 2), where 


Hiv; 2) = (0 € [LAD] : halio 


+ ll div @llžz < ©) 
With the inner product 
(0, Jagi = @ Duzy + (div 0, div) ;2() 


H(div;Q) is a Hilbert space. Moreover, for all 5 € 
H (div; Q), there holds ¢-n € H7~¥/2(P) and || - nlla- 
< Illl aai; EE Girault and Raviart, 1986). 

Incorporating the interface conditions, we can rewrite the 
transmission problem (IP), (1) to (5), into the following 
formulation with first-order system on Q. 


Find (0, u, 0) € H(div; Q) x H'(Q) x HOWL) such 
that 


@=aVu in Q (92) 
-dive=f mQ (93) 
o=O-n on I (94) 


20 — ty) = —Wu — uy) +U- K')\(o—t)) on T (95) 
0=(I— K)(u — u) + VGo-t) on r` (96) 
In the following, let H-1(Q) denote the dual space of 
H'(Q), equipped with the norm lwll z-i = SUP vex") 
(Ww, vem lela) ; ‘ 
We observe that the solution of (92) to (96) is a solution 
of the following quadratic minimization problem. 


Find (0, u, 0) € X = (L4(Q)}! x HQ) x HT) 
such that 


J(®,u,0) = min J(&,v, Tt) (97) 
(,u,t)exX 


where J is the quadratic functional defined by 


JG, v, = avy — llino 
+U — K) — ug) + V(r = tolin 
+ |divg + f — $8; @ (WW — uo) 
420 n= 2p (= KAE ay 
= ja Vv — blizo + WU -Kw 
+Vt—(U — Ku — Violan 


+ divg — 18, @ (Wv + 2g -n — U - Kd) 


+f +48p Q (Wuy + 2t — U — Kt) zea) 
(98) 


Here, 8, QT denotes a distribution in ËQ) forte: 


H-!/2(T). By proving coercivity and continuity of the 
corresponding variational problem, we obtain uniqueness 
of (97), and therefore we have the equivalence of (92) to 
(96) and (97). 

Defining g(¢, v, t) := dive — (1/2)8p @ (Wv + 2g-n— 
Q — K’)»), we can write for the bilinear form correspond- 
ing to J(¢, v, t) 


B(®, u, 0), &, v, 0) 

= (aVu — 0, a Vv — b) r 
+ (UI -Kyu + Vo, ~ Kw + Vomar 
+ (g (0, u, 0), 86, v, D) A-o G2) 
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and the linear functional 


GE, v, 1) = (I — Ku + Vt, (I — K)ug + Vito) wiry 
- (86v, 0, f + 285 
Q (Wty + 2t — I — Kip) gq) (100) 


The variational formulation now reads as follows: 
Find (@,u,o)€X= LAD] x HQ) x H-T) 
such that 


B((0, u, 0), &, v, 1)) =G, v, Yv DEX A01) 


In Maischak and Stephan, we prove the following result. 


Theorem 12. The bilinear form B(.,-) is continuous and 
strongly coercive in X x X and the linear form GC) is 
continuous on X. There exists a unique solution of the 
yariational least squares formulation (101), which is also 


a solution of (92) to (96). 


Choosing subspaces V, C H1(Q), S, C H71), Hy C 
[L2(Q)]?, one can pose a discrete formulation correspond- 
ing to (101) with a discrete bilinear form BO, -), which 
is again uniquely solvable. 

For its implementation, we have to choose basis func- 
tions {o,} of V, basis functions {);} of S,, and basis 
functions {0,} of H,. In case of a h-version, later on we 
choose hat-functions for the discretization of V,, piece- 
wise constant functions, that is, brick-functions, for S,, and 
investigate the use of hat-functions, brick-functions, and 
Raviart~Thomas (RT) elements for the discretization of the 
flux space H,- 

The introduction of basis functions leads to the definition 
of the following matrices and vectors. 


(Ani = (aVo;, aVoir CFidij ER (aV9;, Ozai 
(Ga)ij = (@;, eua Sh); =(f, pra 


For the BE part, we need the well-known dense matri- 
ces 


Vij = Ou, Vj), 


(Ky) ij = his K;) 
(Wi diz = (6, W9)), 


iy = (b; A) 


The matrix representation of the discrete bilinear form 
B“(.,-) and the discrete linear form then becomes 


G, -FI 0 F7 
-F] A, Of+ iW, 
0 0 0 4(K, — h) 


x By [Fas Wn, EKn - D] 


0 6, 
+) G,- K,)* C,10, I, ~ Ky» Val Uy, 
Vy Sp 


Fr 
5 | 5M, | By (f, + EK’ + Dto + Wiola) 
(Ky — th) 


0 
= [u - K| CK — Dug — Viola (102) 
Va 


where [-], denotes testing with the bases functions. Here, 
B, and C, are preconditioners for the matrices A, + M, 
and V,, where M, is the mass matrix. 

We prove in Maischak and Stephan the following theo- 
rem dealing with the preconditioning for the discrete system 
(102). In applications, the matrix E, should be an eas- 
ily computable approximation of the inverse of the mass 
matrix, for example, a scaled identity matrix. 


Theorem 13. Let E, be such that- (E; 6m b)i) ~ 
Cro Enez Then, with the preconditioners B, and Cp, 
there holds the equivalence 


(Ex Ew Endra + (By Ve Yaram + (Cr Te Wee 
~ BO Eps Vhs Ta), Cr Var Tad) 


Therefore, diag(E; 1B r Ch 1) is spectrally equivalent 
to the system matrix B“(.,-), and if the block diagonal 
matrix diag(E,, Bp, Cp) is applied to the system (102), the 
resulting system has a bounded condition number. 


B, denotes the preconditioner for the FE matrix A, 
stabilized by the mass matrix Mp, (Mp) = (Qi >))22(@) 
(see Funken and Stephan, 2001; Heuer, Maischak and 
Stephan, 1999), and C, is the preconditioner for the matrix 
with the single-layer potential V,. For B, and C,, we 
use MG, BPX, and the inverse matrices (A, + M,)7! and 
V7! (INV) (performed by several MG steps). The MG 
algorithm for B, gives a preconditioner, which is spectrally 
equivalent to the inverse of the above stabilized FE matrix 
Ap, whereas BPX for B, leads to a linear system with a 
mildly growing condition number. In case of C,, we use, 
as in Bramble, Leyk and Pasciak (1992), the multilevel 
algorithm (MG and BPX), which incorporates the second- 
order difference operator. Another preconditioner Cp, which 
is spectrally equivalent to the inverse of the single-layer 
potential matrix V, up to terms depending logarithmically 
on h, is given by using additive Schwarz method based on 
the Haar basis (Tran and Stephan, 1996; Maischak, Stephan 
and Tran, 1996). As the linear system solver, we take 
the preconditioned conjugate gradient algorithm until the 
relative change of the iterated solution is less than 8 = 1078. 
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The implementation of the least squares coupling method 
uses only components that are also necessary in the imple- 
mentation of the standard symmetric coupling method and 
offers the advantage that there is no need for a gener- 
alized Krylov method like MINRES, as in Section 3, or 
GMRES, as in the case of the symmetric coupling method; 
see Funken and Stephan (2001) and Mund and Stephan 
(1998). The well-known preconditioned conjugate gradi- 
ent algorithm is sufficient. The computation of the flux is 
done with negligible implementational effort because no 
preconditioner is needed (cf. Theorem 13), and the Galerkin 
matrices involved are sparse and very easy to implement. 
The numerical experiments presented below are done with 
the software package maiprogs (see Maischak, 2003). For 
further numerical experiments with piecewise constant and 
piecewise linear fluxes, see Maischak and Stephan. 

To underline the above approach, we present least squares 
FE/BE coupling computations from Maischak and Stephan, 
in which, on a uniform triangulation with rectangles, @ is 
computed with H (div; 2), conforming RT elements of the 
lowest order, u, is piecewise bilinear and o, is piecewise 
constant. 

In Table 3, the corresponding L?-errors 89,8,,3, are 
given for the following example. 

Let Q be the L-shaped domain with vertices (0, 0), (0, 
1/4), (—(1/4), 1/4), (—(1/4), —(1/4)), (1/4, -(1/4)), and 
(1/4, 0). Now, we prescribe jumps with singularities on the 
interface T = ðQ and take f = 0 in Q and a = 1. Setting 


ulr, p) = r?’ sin [Fen = o] 


| 1\? 1\2 a 
~topy (x~5) +(v-3), flrs) = 58 


the solution of the interface problem (IP), that is, (1) to (5), 
is given by 


u(r, 9) =r? sin [Fe= - J inQ, 


10000 e—a tq a tt 
RT (MG) — 
RT (BPX) == 
RT (INV) e+ 
1000 


Condition number 


10500 10000 100000 1e +06 


Degrees of freedom 


Figure 2. Condition numbers, 0}, with RT elements. 


1\? 1\? 
u(r, 9) = log ( -3) +(»-3) in Q, 


The experimental convergence rates given in Table 3 con- 
firm the theoretical convergence rates, In Figure 2, the 
condition numbers for the preconditioned system (102) 
are plotted, showing excellent behavior for the block- 
inverse and the MG preconditioner, whereas BPX slowly 
degenerates, 


5 FE/BE COUPLING FOR INTERFACE 
PROBLEMS WITH SIGNORINI 
CONTACT 


5.1 Primal method 


Let QCR’, d>2, be a bounded domain with Lips- 
chitz boundary T. Let T =T, OT,, where T, and T, are 
nonempty, disjoint, and open in I. In the interior part, we 
consider a nonlinear partial differential equation, whereas 


Table 3. L?-errors 89, bu, 85 for Oh, uz, Op and convergence rates (8, with RT, MG preconditioner), 
Sp pepe see enc ape eh arctic cect = T 


#total h 8e lg bu Qu òo Qs 
209 0.06250 0.05517 0.0008359 0.45692 

705 0.03125 0.03535 0.642 0.0002838 1.558 0.40826 0.162 
2561 0.01562 0.02249 0.653 0.9398E-04 1.594 0.36480 0.162 
9729 0.00781 0.01425 0.658 0.3147E-04 1.578 0.32598 0.162 
37889 0.00390 0.00901 0.661 0.1203E-04 1.387 0.29128 0.162 
149 505 0.00195 0.00569 0.663 0.6128E-05 0.973 0.26026 0.162 
593 921 0.00097 0.00359 0.665 0.3801E-05 0.689 0.23248 0.163 


2367 489 0.00048 0.00226 0.665 


0.2468E-05 0.623 0.20758 0.163 
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in the exterior part, we consider the Laplace equation 


—div(((Vu) -Vu) =f in Q (103) 
-Au=0 in Q =R (104) 


with the radiation condition 
u(x) = Olx?) ford =2, (|x| > 00) (105) 


Here, o : [0, 00) — [0, 00) is a C1[0, 00) function with 
t - o(t) being monotonously increasing with t, o(t) < 0o, 
(t- a(t)’ < o; and let g(t) + t : min{0, o(t)} > a > 0. 

Writing u; = u|o and u, := u|go,, the tractions on T are 
given by o(|Vu;|)ðu,/ðn and —(du,/dn) with normal n 
pointing into Q,. 

We consider transmission conditions on I’: 


“lp, ~ alr, = tolr, 


ae du du. 
ave D SZ in = olr (106) 
and Signorini conditions on T, 
uilr, = Mair, £ “olr, 
du,, _ ðu, <0 107 
ave) ir, = a le + tolr, s (107) 


ðu 
0 = oll Vu D lr, (2 + Mo = u)r, 


Given data f € L?(Q), uy € HY?(T), and t € H71?) 
(with (f, 1)p2¢q) + (to, 1) = 0 if d = 2), we look for u; € 
H'(Q) and u, € H}.(Q,), satisfying (103) to (107) in a 
weak form. 

Setting 


g(t) = ['s-aas 
the assumptions on 0 yield that 
Gu) = 2 f eu ax 
is finite for any u € H! (Q) and its Fréchet derivative 


DG(u; v) = af o(lVul)(Vu)?-Vudx Yu, v € H'(Q) 
2 


(108) 
is uniformly monotone, that is, there exists a constant y > 0 
such that 


Ylu — vno < DG (us u — v) 
—DG(w:u—v) Yu, v € H(Q2) (109) 


(see Carstensen and Gwinner (1997, Proposition 2.1). 


Following Carstensen and Gwinner (1997), Maischak 
(2001) analyses an FE/BE coupling procedure for (103) 
to (107), which uses the Steklov—Poincaré operator S (32) 
for the exterior problem. 

Let E := HQ) x HA? T), where Hy,’ (I) := {w € 
HP) : supp w CT} and set 


D := {(u, v) € E : v 20 ae. on F, 
and (S1, ulp +v — up) = 0 if d = 2} 


Then, the primal formulation of (103) to (107), called 
problem (SP), consists in finding (ĉ, 5) in D such that 


Wii, ô) = git, Yu») 

where 
Wu, v) = 2 f e(ivul ax 

F 5 (Stulp +v), ulr +v) — lu, v) 

and }, € E*, the dual of E, is given by 
Mu, v) := LQ, ulp + v) + (Sup, ulr + v) 
with 
Lu») =2 f frude+2f nvas 
2 r 


for any (u, v) € E. 

Owing to Carstensen and Gwinner (1997), there exists 
exactly one solution (#, 0) € D of problem (SP), which is 
the variational solution of the transmission problem (103) 
to (107). Moreover, (i, 0) € D is the unique solution of the 
variational inequality 


Ai, Xu — i, v — 8) > MCU — û, v — Ô) (110) 
for all (u, v) € D, with 
Alu, v) (r, s) = DG (u, r) + (Slulp +v), rir +s) GID 


For the discretization, we take nested regular quasiuni- 
form meshes (7,), consisting of triangles or quadrilaterals. 
Then, let H} denote the related continuous and piece- 
wise affine trial functions on the triangulation 7,. The 
mesh on Q induces a mesh on the boundary, so that 
we may consider H; 1/2 as the piecewise constant trial 
functions. Assuming that the partition of the boundary 
also leads to a partition of T, A}? is then the sub- 
space of continuous and piecewise linear functions on 
the partition of I’,, which vanish at intersection points in 
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Ë, NĪ, Then, we have Hj x Al? x Hy? C HQ) x 
HUT) x HYT). Now, Dp is given by 


Dy i= [Ctp Un) © Ay X H,? 
: v(x,) = 0, Yx; node of the partition of T 
and (S1, alr +0, ~ Mo) =Oifd = 2} (112) 


Note that v, = 0 once the nodal values of v, are >0. 
Therefore, we have D, C D. With the approximation Sis 
as in (36) of S, the primal FE/BE coupling method (SP}) 
reads as follows: 

Problem (SP,,): Find (i, dn) € Dh such that 


Ap Ôp Ph) — ys Va — În) = rlin — Ân Va = În) 
(113) 


for all (up, Vp) € Da» where 


Ayn Va) ho Sh) 
i= DG (Up, Tp) + (Spar + Va), Talr + Sa) (114) 


and 
Nlp Vp) = Lys Uhle + Yh) + Sato» unir +v) 015) 


with the discrete Steklov—Poincaré operator Sp (57). 

As shown in Carstensen and Gwinner (1997), the solution 
(p, ĉn) E Dp of (SP,) converges for h —> 0 toward the 
solution (#, 8) € D of (SP). Maischak (2001) presents an 
a posteriori error estimate on the basis of hierarchical 
subspace decompositions, extending the approach described 
in Section 2.2.2 to unilateral problems, and investigates an 
hp-version for the coupling method. 

Next, we comment on the solvers for (SP,,) in the linear 
case (o = 1), where the matrix on the left-hand side of (113) 
becomes 


A B 0 
A, | BT CS Ssr (116) 
0 Sor Sss 


Here, A, B, C denote the different parts of the FEM-matrix 
(A belonging to the interior nodes, C belonging to all 
boundary nodes, and B belonging to the coupling of interior 
and boundary nodes), and S with its different subscripts is 
the Steklov—Poincaré operator acting either on whole I’, or 
T, or both. 

In Maischak (2001), 2-block preconditioners of the form 


— Basc 0 
B= ( 0 Bs 


are applied, where Bygc is the symmetric V-cycle MG pre- 
conditioner Bug, age oF the BPX preconditioner Bypy aac 


E - -+ mass matrix, and By is the MG 
V-cycle preconditioner Byg,s oF the BPX preconditioner 
Bppx s belonging to Sss respectively. Then, the precondi- 
tioned systems have bounded or mildly growing condition 


numbers, that is, 


belonging to ( 


2 
K(ByoA,) = C, K(Bgpx Ay) £ C (: + (e i) ) 


with some constant C. 

From Maischak (2001), we present numerical experi- 
ments for a two-dimensional interface problem with Sig- 
norini conditions on both large sides of the L-shaped 
domain with vertices (0,0), (0, 1/4), (-(1/4), 1/4), 
(1/4, —1/4), (1/4, -(1/4)), and (1/4, 0). 

We set o = 1, f = 0, and ug = r°% sin(2/3)(ẹ — (1/2)), 
to = (8/ðnJuo, and T, = 1/4, (1741/4, —/4)) U 
0/8, 1/1/74, 1/4) in (103) to (107). 

Ail computations are done using rectangular mesh ele- 
ments with linear test and trial functions for the FEM part. 
We have tested the preconditioned Polyak algorithm. Note 
that the Polyak algorithm is a modification of the CG- 
algorithm (see O’Leary, 1980). Preconditioners have been 
the MG algorithm (V-cycle, one pre- and post-smoothing 
step using dampened Jacobi with damping-factor 0.5) and 
the BPX algorithm. Tables 4, 5, and 6 give the extreme 
eigenvalues Apin» max and the condition numbers K for 
the original system, the system with multigrid precondi- 
tioner, and the system with BPX preconditioner. We note 
the linear growth of the condition number of the original 
system, the logarithmic growth of the system with BPX 
preconditioner, and that the condition numbers for the sys- 
tem with MG preconditioner are bounded (see Chapter 6, 
Volume 2). 


5.2 Dual-mixed method 


Now, we consider again problem (103) to (107), but restrict 
ourselves to the linear case ọ = 1. 

In this section, we give a dual-mixed variational formu- 
lation of this linear Signorini contact problem in terms of a 


Table 4. Extreme eigenvalues and condition numbers K of Ap. 


N Amin Amax K 
20 +8 0.1191171 6.8666176 57.645964 
64+ 16 0.0440775 75506673 171.30417 
232 +24 0.0140609 7.8599693 558.99601 
864 + 32 0.0040314 7.9613671 1974.8471 
3288 + 40 0.0010835 7.9899020 7373.9834 


A l ua a a 
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Table 5. Extreme eigenvalues aud condition numbers k of Bmg 
An with multigrid. 


N Amin Nmax K 
20+8 0.2638324 51.945662 196.88885 
64+ 16 0.2544984 52.189897 205.06963 

232 + 24 0.2484367 52.274680 210.41444 
864+ 32 0.2442467 52.303797 214.14336 
3288 + 40 0.2412018 52.313155 216.88543 


Table 6. Extreme eigenvalues and condition numbers K of Bgpx 
An with BPX. 


N Amin Nmax K 
20+ 8 0.3812592 10.995214 28.839212 
64+ 16 0.4267841 15.119538 35.426666 

232 + 24 0.4458868 17.891084 40.124721 
864 + 32 0.4544801 19.893254 43.771457 
3288 + 40 0.4585964 21.465963 46.807965 


convex minimization problem and an associated variational 
inequality. 

In Maischak (2001) and Gatica, Maischak and Stephan 
(2003), a coupling method is proposed and analyzed for 
dual-mixed FE and BE for (103) to (107) using the inverse 
Steklov—Poincaré operator R given by 


Ris S! =V +(+ K)W(I +K’) 
: HOAT) > HT) (117) 


Note that the operator W: H'/2([)/IR > H~/2(L) is pos- 
itive definitc and therefore continuously invertible. 

In order to fix the constant in H'/*()/IR, one usu- 
ally chooses the subspace of functions with integral mean 
zero. However, since this is not optimal from the imple- 
mentational point of view, we add a least squares term 
to the hypersingular integral operator W. In other words, 
we define the functional P: H'/2(T) > R, where P() = 
frods for all p € H'/?(P) with adjoint P’, and set the 
positive definite operator 


Ŵ := W + PP: HP) > HOAT) (118) 

In this way, we can evaluate u := R(t) for t e H~/2(r) 

by computing u = (1/2)(Vt + (I + K)), where © is the 
solution of 


Wo =U +K’^t (119) 


Representing the solution @ of (119) as p= Q+ Cy 
such that Po) = 0 and c, ER, and using that (Wo, p= 
(7 +K%t,1), W1 =0, and Ki=—1, we deduce that 


{P1, Po) =0, and, consequently, cy=0 and p= oy € 
Hr) /R is the unique solution of Wọ = (I + K’)t. 


Therefore, we can replace W and H!(T)/R by W 
and H'/2(P) for the discretization without mentioning it 
explicitly. 

Next, we introduce the dual formulation (SP) using 
the inverse Steklov—Poincaré operator R. To this end, we 
define Y : H(div; Q) > RU {oo} by 

ee 
Wg) = 3 laiza + Ha «n, R@-n)) 


— $(q +m, R(t) + 2up) (120) 
and the subset of admissible functions by 
D := {q € H(div; Q :q-n <0 on T, 
—divq=f in Q} 


Then, the uniquely solvable problem (SP) consists in 
finding q? € D such that 


Üq?) = min Yq) (121) 
qeD 


As shown in Maischak (2001) and Gatica, Maischak and 
Stephan (2003), problem (SP) is equivalent to the original 
Signorini contact problem (103) to (107) with ọ = 1. 

_Next, we want to introduce a saddle-point formulation of 
(SP) and define H : Hiv; Q) x LQ) x HET) > 
RU {oo} as 


H(p, v, p) = tos f v div pax + | fudx 
Q 
+ (p:n, Wr, (122) 


for all (p, v, p) € H(div; 2) x L2(Q) x HYT), and 
consider the subset of admissible functions 


ALT) = {ue HET): =O} (123) 


Then the desired saddle-point problem (M) reads as 
follows: 


Problem (M): Find (4, @, 4) € H(div; Q) x L2(9) x Hy? 
(T,) such that 


HG, u, X) < HG, â, $) < HG, a, 
Y (q, u, X) € H (div; Q) x £2(Q) x AT) (124) 


which is equivalent (see Ekeland and Temam, 1974) to find- 
ing a solution (4,4, 4) € H(div; Q) x L (Q) x 
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AA T) of the following variational inequality: 


alâ, q) + blq, â) +d(q, Ñ) = (q-n,r) 
Yq € H(div;Q) (125) 


b(ĝ, u) = - f fuds Yu € LQ) (126) 


d, n- Â) <0 vre ÄT) 27) 


where 
a(p.q) =2 | p-qax+ (g:n, R-n) 
Y p,q € H(div; Q) (128) 
van) = f u divg dx 
2 


V(q,u) € H(div; 2) x L7(Q) (129) 
4(q,\) = (qn, Np, 
V(q,d) € Hiv; Q) x HT) (130) 


and r = R(t) + 2up. 
Now, we define the bilinear form 


BG, (u, d)) = b(q, u) + dq, X) 
V(q,u, X) € H(div; 2) x L7(Q) x BTY) 031) 


and observe that the above variational inequality can be 
written as 


alâ, q) + BG, û, )) 
=(q-n,r) Yq € H(div; Q) (132) 
BG, (u — û, ù — Î)) 


< =J. f(u- ûjdx Yu, N) € LQ) x AT) A33) 


The dual problem (SP) and the saddle-point problem (M) 
are related to each other as follows. 


Theorem 14. The dual problem (, SP ) is equivalent to the 
mixed dual variational inequality (M). More precisely, 

(i) If G, û, Ñ) € H@iv; Q) x LQ) x Hg’ (T,) is a sad- 
dle point of H in H (div; Q) x L? (Q) x Figg owe ), then G = 
Vi, i= 1/2R (ly — ĝ. n) + Uo on T, he —(1/2)RG@ - 
n— )-+ug—t onl, andg € Dis the Solution of prob- 
lem (SP). 

(ii) Let q? € D be the solution of (SP), and define | := 
—(1/2)R(q? -n — to) + ug — û on T, where û € H! (Q) is 
the unique igr ai of the Neumann problem: —Aû = ae in 
Q, di/8n =q? -n onT, such that (p, û + 1/2R(@” 

to) — Up) = lee all y € HP) with p< —q?-n on 


T- Then, (q?, i, hi is a saddle point of H in H(div; Q) x 
LOA) x Rr, j; 


Next, we deal with the numerical approximation for 
problem (SP) by using mixed FE in Q and BE on I, 
as given in Maischak (2001) and Gatica, Maischak and 
Stephan (2003), For simplicity, we assume that T, and I’, 
are polygonal (i.e. piecewise straight lines) for d = 2 or 
piecewise hyperplanes for d > 3. 

Let (7,)n<, be a family of regular triangulations of the 
domain & by triangles/tetrahedrons T of diameter hy such 
that h := max{h, : T € 7}. We denote by pz the diameter 
of the inscribed circle/sphere in 7, and assume that there 
exists a constant k > 0 such that for any h and for any 
T in 7,, the inequality (kr/pr) < k holds. Moreover, we 
assume that there exists a constant C > 0 such that for any 
h and for any triangle/tetraliedron T in J, with T N 8Q is 
a whole edge/face of T, there holds |T N3Q| = Chit, 
where |T NƏQİ denotes the length/area of T N 3N. This 
means that the family of triangulations is uniformly regular 
near the boundary. 

We also assume that all the points/curves in I, Nn Ñ, 
become vertices/edges of 7, for all k > 0. Then, we 
i by E, the set of all edges/faces e of 7, and put 

= {e € E, : e CT}. Further, let (t;);,., be a family 
a TEA regular triangulations of the boundary part 
T, by line Segments/triangles A of diameter h a such that 
hex max{h, : A € ta}. 

We take 7 C (0, 00) with 0 € J, and choose a family of 
finite-dimensional subspaces (X, j)ne her = (Ly X Hp x 
Hy? x Hi? x HP) of X=L(Q) x H(div; Q) x 

HOP) x H/R x H4’ (T,), subordinated to the 
corresponding triangulations, with Hy ue being the restric- 
tion of H, on T, and we assume that the following 
approximation property holds 


Ite, g, W, p, X) 


dm | inf 
Dheri - m 
REN (tnan Was Pa AEX, i 


= Uns qr» Yo dn lie] =0 (134) 


for all (u, q, W, p, à) € X. In addition, we assume that the 
divergence of the functions in H, belong to L,» that is, 


{div q,: qp € H,} S Ly (135) 


Also, the subspaces (L,, H i and H, are supposed to 


„ verify the usual discrete Babuska—Brezzi condition, which 
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means that there exists B* > 0 such that 


Bq, (tp, dg) 


inf sup 
(up dpdely xt ans acts laa lacaia Ka AD llr 
Atty dp AO 


> B* (136) 


Now, for h,hel, let jy H, > H(div;Q), ky: 
H; °? > HOA) and lp: Hg” > H"2(T)/R denote 
the canonical imbeddings with their corresponding duals 
df, ky and Uf. 

In order to approximate R, we define the discrete 
operators 


Rp := ify'Ryj, and Ry = AY VYJ + ity 
x (I+ KV, GW) Td + KA, 


where y: H(div; Q) > H-/?(P) is the trace opera- 
tor yielding the normal component of functions in 
Hiv; 9). $ 

We remark that the computation of R, requires the 
numerical solntion of a linear system with a symmetric pos- 
itive definite matrix W, :=1{WI,. In general, there holds 
R, Æ R, because R, is a Schur complement of discretized 
matrices, while R, is a discretized Schur complement of 
operators. 

Then, in order to approximate the solution of problem 
(M), we consider the following nonconforming Galerkin 
scheme (M,): 


2 


Problem (M,): Find (4, În, fa) e H, x Lp x H j such 


that 


an Ôn dr) + Blans Ân) + (Gy, Ki) = (ap n, F) 


Yq, € H, (137) 

bGy, Uy) = -f fu,dx Yup ELp (138) 

dân a Âp 0 Voy € wee (139) 
where 

HP cape? : w20) (140) 


Shh” 
an(p.a) =2 | p-gae + (gn, Ry(P-m)) 
Vp,q € H, (141) 


b(q, u) =i udivgdx V(q,u)€H,xL, (142) 
2 


dq) = (qin Nr, YN € Hax HI, 


a (143) 


and 
ry = RLV + (O + KL, UW) + Kt + 2uo] 


Note that the nonconformity of problem (M,,) arises from 
the bilinear form a, (-, -) approximating a (+, -). 

There holds the following a priori error estimate (see 
Maischak (2001) and Gatica, Maischak and Stephan (2003)) 
yielding convergence for the solution of the nonconform- 
ing Galerkin scheme (M,,) to the weak solution of (M) 
and therefore to the weak solution of the original Signorini 
contact problem owing to the equivalence result of Theo- 
rem 14. 


Theorem 15. Let (9, û, Â) and (Gy, id, fg) be the solu- 
tions of problems (M) and (M,,) respectively. Define 9 := 
WAK- n) and o):= WU +K to. Then, 
there exists c > 0, independent of h and h, such that the 
following Cea type estimate holds: 


lâ — Gn llarcaiviay + IÊ — ĉn lero + IA Ag lary 
<c inf Iĝ — wo + inf |ê —u 2, 
= ok 4 arll aava EnA Il aliz QD 


k A 1/2 
+ inf M-a 


mean * y m, a dallaveqye 


PER eh 


ie ae a Ho ~ halavava} (144) 


The proof of Theorem 15 uses besides (134) and (135) the 
discrete Babuška—Brezzi condition (136). A suitable choice 
for FE and BE spaces are L,, the set of piecewise con- 
stant functions, H,, the space of H (div; 82) conforming RT 
elements of order zero, and H, m i , the set of continuous 
piecewise linear, nonnegative functions of the partition t; 
of By. 

Next, we present an a posteriori error estimate with resid- 
ual type estimator, which is given in Maischak (2001). 


Theorem 16. Let d = 2. There exists C > 0, independent 
of h, h, such that 


IÊ — Ên lza: + MÊ — nlr + A Dall rey 


1/2 
<C (z a) (145) 


TET 
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where, for any triangle T € Tp, we define 


ne =I(f + div yl Z2cry 
+ htl curi) zer) + AT llGallizery 


$ 2 
+ D heln — Ho + Ai — Saltzea 
eeElT)NE T) 
a P2 
+ J Mâ Alo 
ecElT)NE (Q) 
d 2 
+ 5, nda -7+ euo 
ds Le) 
ecE(TINE(T) 
ad. Ê 
2 E 
+ E anbot DASH, 
ECET NET) eeE(T)NV 5 Life) 


with sp being the L?-projection of Ep — Ug + iy onto the 
space of piecewise constant functions on T, OT, and E(T) 
being the edges of T, using 


ba = V Ân n — t0) = E+ KD, 
bn = Wo, + U + KG, n= to) 
by = GW) RC + KYA Ga n) 
In order to solve the discretized saddle-point problem 
(M,,), Maischak (2001) and Gatica, Maischak and Stephan 
(2003) propose a modified Uzawa algorithm that utilizes the 


equation for the Lagrange multiplier à. For this purpose, 
. p g2 I fana 
we first introduce the operators P; : H g > Hpg an 


® +: H (div; Q) > me € mer) that are defined as 


Vr, eH? 


(Pr. A, Ag — PiN weary Z O i 


and 


1/2 
AD = On P@) nr) YEH (46) 
with given A € a and q € H(div; Q). 

Now, the modified Uzawa algorithm is formulated as 
follows. 


1. Choose an initial A® € H3% z. 
2. Given M® € HY find g™,u™ € H, x L, such 
that 
a,(q™, Gn) ab blar» u™) = Tr ar’ n) — dQ» x) 
Ya, € H, (147) 


bq, up) = — (f, uneo Vn EEr 
(148) 


3. Compute 1+) by 


nD = PAM + 8dq™) (149) 


4. Check for some stopping criterion and if it is not 
fulfilled, then go to step 2. 


The convergence of this algorithm is established in the 
following theorem (Maischak, 2001 and Gatica, Maischak 
and Stephan, 2003). 


Theorem 17. Let 8 €]0,2[ and consider any initial value 
ho E HY Ra Then, the modified Uzawa algorithm converges 
toward the solution of the discrete problem (M). 


We remark that the operators P; and ® are defined with 
respect to the scalar product of the Sobolev space Hef 2 T), 
which is not practical from the computational point of view. 
Fortunately, in the convergence proof we only need that 
the norm induced by the scalar product is equivalent to the 
HY’ (,)-norm. Therefore, we can use the bilinear form 
(W.,-} instead of the scalar product G,°) HPC)" Then, 
the computation of the projection Pj) now leads to the 
following variational inequality: find Pj, € HY such 
that 


(W Pad, dg — Pad) = (Wh, Ag — PEM) 


Wag € A, 050 


Similarly, the computation of the operator ® is done by 
solving a linear equation: find (q) € H:i such that 


(W(q), 4) = daa) YME Hie. (151) 


Both systems are small compared to the total size of the 
problem because they are only defined on the Signorini part 
T, of the interface I’. Applying (151) to (150), we obtain 
ford = 0 + 80(q™), 
(Wr, dg — Prd) = (WH, 0G — PEA) 
+3(WOq™), ri — PiN) 
= (WX, ag; — Ped) 
+ 8d(q™, Ng — PiN) 


and hence the explicit solution of (151) is avoided. 


Finally, we present iteration numbers for the Uzawa © 


algorithm applied to the above dual-mixed FE/BE coupling 
method for the example in Section 5.1 with the above 
choice of FE/BE spaces given after Theorem 15. - 

Table 7 gives the numbers of outer iterations for the 


Uzawa algorithm with 5 = 1.3. We notice that the numbers 
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Table 7. Iteration numbers for the Uzawa algorithm. 


i: 


dim Ly, dim Hp dim H: ie Iteration 
Ta 

12 32 3 47 
48 112 A 48 
192 416 15 47 
768 1600 31 47 
3072 6272 63 48 
12288 24832 127 47 
49152 98816 255 46 


of outer iterations are nearly independent of the problem 
size. The inner linear systems ate solved with the GMRES 
algorithm (see Chapter 9, this Volume). 


6 APPLICATIONS 


6.1 Symmetric coupling of standard finite 
elements and boundary elements for 
Hencky-elasticity 


In the following, we consider an interface problem from 
nonlinear elasticity in which in a three-dimensional boun- 
ded domain Q,, with a hole, the material is nonlinear elastic 
obeying the Hencky-von Mises stress—strain relation, and it 
is linear elastic in a surrounding unbounded exterior region 
Q,. The boundaries T, and I of Q, are assumed to be 
Lipschitz continuous, where I, denotes the boundary of the 
hole and T is the interface boundary between Q; and Q. 
We assume the nonlinear Hencky-von Mises stress~strain 
relation of the form 


o = (k — 3p Q))I- divu, + 2p(y)e 


where o and «e= 1/2(Vu! + Vu) denote the (Cauchy) 
stresses and the (linear Green) strain respectively (see Nečas 
and Hlavétak, 1981; Nečas, 1986; Zeidler, 1988). Then, if 
we define 


a 2 
P,(u,); = Ox, (r 5 =u(via,)) 


3 
y ð 
- divu; + > 2 ‘ap, u(y(u))e; 0) 
gat OY 


for i = 1,2, 3, the equilibrium condition divo + F = 0 
gives 


Piuy=F indy (152) 


Here, the bulk modulus k and the function p(y) in P, satisfy 
(cf. e.g. Netas, 1986) 


= 3 
0 < ño < uly) < 5h 
~ du 
OMe ME wt yg S a OD 


where {ip, (11, Ñ, are constants, and 


3 


U N 
ya) = > («, ~ 833 <div u) 


ij=i 


_1fdu, hj 
=z (i 7 Ox; 


i 


Ina surrounding unbounded exterior region 2, we consider 
the homogeneous Lamé system describing linear isotropic 
elastic material, with the Lamé constants p, > 0, 32, + 
2w, > 0, 


P (uy) = -pA 
— (^z + fg) grad div u, = 0 in Q, (153) 
In Costabel and Stephan (1990), we consider the following 
interface problem (see also Gatica and Hsiao, 1995): For a 
given vector field F in Q,, find the vector fields u; in Q) 


(j = 1,2) satisfying u,|,, =Q, the differential equations 
(152) and (153), the interface conditions 
u, =U, T,(u,) = 7%) onT (154) 


and the regularity condition at infinity 


~-o(H) 


Here, with p, =p(y(u,)), ^ = k — (2/3)n(yu,)), the 
tractions are given by 


as [x] ——> co (155) 


T, (uj) = 28,0; + dyn divu; + pyn x curlu; (156) 


and 9,u, is the derivative with respect to the outer normal 
on’, 

We are interested in solutions u of (152) to (155), which 
belong to (H},(9;))?, that is, which are of finite energy. 
A variational formulation is obtained as in Costabel and 
Stephan (1990). An application of the first Green formula 
to (152) yields 


Í P uwas = 0,0, w) — f Twas (157) 
Qı r 
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for all w € [H}(Q,)]°, where 
2 
(u, w) = Pi {( _ zea) div u, div w 


3 
+ J, 2v); ay) «yw dx (158) 


ij=1 


On the other hand, the solution u, of (153) is given by the 
Somigliana representation formula for x € Q: 


u(x) = fire, 320) — Go, y)b2(y)} ds(y) (159) 


where v, =U, , = 7,(u,) on T, and the fundamental 

solution G(x, y) of P,w, = 0 is the 3 x 3 matrix function 
dy + 3M3 

STM + 22) 

mt l S a Sone] 
Ix—yl M t3p. =y’ 


G(x, y) = 


with the unit matrix I and 7,(x, y) = T} ,(G,x, y)", 
where superscript T denotes transposition, Taking Cauchy 
data in (159), that is, boundary values and tractions on I for 
x — T, we obtain a system of boundary integral equations 
oar, 


v = (+ Kv, — Vo, and 26, = —Wv, + A- Ko, 
(160) 
with the single-layer potential V, a weakly singular bound- 
ary integral operator, the double-layer potential K and its 
dual K’, strongly singular operators, and the hypersingular 
operator W defined as 


Vha) =2 [ G(x, y) p0) ds) 
Ky,(x) = af T,(x, y) v2) ds(y) 
E 
Kp) = Ta f 6264, 9)" 620) 880) 
Wy, (x) = a i T(x, y) ¥2(y) ds (y) 


As in interface problems for purely linear equations (cf. 
Section 2), we obtain a variational formulation for the 
interface problem (152) to (155) by inserting a weak form 
of the boundary integral equations (160) on T into the 
weak form (157) and making use of the interface conditions 
(154), that is, 7, = T) =: @ and v, =u, =: u. 

This yields the following variational problem: for given 
Fe 1?(Q,), find u € H'(2,)3, 6 € HYT} such that 


ulp, = 0 and 


bi dsw,W)=2] F-wde 
for all k 
(w, W) € HUQ Y x HT (161) 


Here, with the form ®,(-, -) in (158) and the brackets (., -) 

denoting the extended L?-duality between the trace space 

#H/2(>)3 and its dual H712 (T)?, we define 

blu, $; w, Y) := 20, (u, w) + (w, Wu) — (w, A — K'ẹ) 
— (@- Ku, 4) - (Y, Vo) (162) 


Theorem 18, For F € L?(Q,)°, there exists exactly one 
solution u € H}(2,)°, 6 € H7} of (161), yielding a 
solution of the interface problem (152) to (155) with u, =u 
in Q, and u, given by (159) in Q. 


The proof in Costabel and Stephan (1990) is based on 
the fact that the C?-functional, 


J (u, ) := 2A(u) + tu, Wu) 
1 
-2f Fudx + (6, K — Du) ~ =(6, V) 
Qı 2 
1 yu) 
Aw) =f [5Alaivui?+ f w(t) at} ax (163) 
Qı 2 0 


has a unique saddle point u € H1(Q,)?, 6 € HTP. 
The two-dimensional case, treated in Carstensen, Funken 
and Stephan (1997) requires minor modifications only. 

A key role is played by the following properties of the 
functional 


1) = Aw) ~ f Fudx 


of the single-layer potential operator V and of the hyper- 
Singular operator W: Jy is strictly convex, that is, there 
exists X, } > 0 such that for all u, w € H(&,)3, the second 
Gateaux derivative satisfies 


AIW o, < D Rw: w < Awi o, (164) 
There exists y > 0 such that for all @ € H7/2(r)3, 
(6, Vo) > yiloll2yo.r (165) 


and 


(v, Wv) > 0 for all yv € H? (T (166) 
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For (164) see Nečas (1986); for (165) and (166) see 
Costabel and Stephan (1990). The saddie-point property of 
the function J, in (163) is typical of the symmetric coupling 
of FE and BE. 

Given finite-dimensional subspaces Xj, x Yy of H! 
(21)? x H7'?(1)%, the Galerkin solution (ay, dy) € Xm 
x Yy is the unique saddle point of the functional J, on 
Xy X Yy; the Galerkin scheme for (161) reads as follows: 
Given F € L?(Q,)° find wy € Xy and by € Yy such that 
forall w € Xy and UE Yy, 


blum, dys W, v= f F -wdx (167) 


The following theorem from Costabel and Stephan (1990) 
states the quasioptimal convergence in the energy norm for 
any conforming Galerkin scheme. 


Theorem 19. There exists exactly one solution (Uy, by) € 
Xu X Yy of the Galerkin equations (167). There exists a 
constant C independent of X y and Yy such that 


lu = uylay + HO — Oyler (168) 
< c| inf u- wia + jaf, IO — Wane} 


where (u, 6) € H! (Q)? x HYT) is the exact solution 
of the variational problem (161). 


The Galerkin solution (Uy, dy) € Xm X Yy of (167) is 
the unique saddle point of the functional J, on Xy x Yy, 
that is, DJ, (ay, by)LW; Y] = 0 for all (w, Y) € Xy X Yy- 

Note that the symmetric coupling procedure was applied 
above to a nonlinear strongly monotone operator P}, but 
we mention that it can also be applied to some other 
nonlinear elasticity, viscoplasticity, and plasticity problems 
with hardening; see Carstensen (1993, 1994, 1996a) and 
Carstensen and Stephan (1995c). Numerical experiments 
with the h-version for some model problems are described 
in Stephan (1992). 


6.2. Coupling of dual-mixed finite elements and 
boundary elements for plane elasticity 


Following Gatica, Heuer and Stephan (2001), we consider 
the coupling of dual-mixed FE and BE to solve a mixed 
Dirichlet~Neumann problem of plane elasticity. We derive 
an a posteriori error estimate that is based on the solution 
of local Dirichlet problems and on a residual term defined 
on the coupling interface. 

Let Q =Q; UP UQs be a polygonal domain in R? 
with boundary 39 = Iy U T p, where I, is not empty and 
2g N (Fy U Ip) = Ø (see Figure 3). For a given body load 


— [py 


—_—_— 
T Q, Q T 
'B o F N 
Y 


Figure 3. Geometry of the problem. 


f on Q, vanishing on Q,, and a given displacement g on 
T p, we consider the linear elasticity problem 


divo=-f inQ (169a) 
o=Ceu) nQ (169b) 
u=g only (169c) 
o-v=0 only (169d) 


Here, u is the displacement field, e(u) := 1/2(Vu + (Vu)") 
is the strain tensor, and v is the outward unit normal of 
Qp. The elasticity tensor C describes the stress-strain rela- 
tionship. In the simplest case, we have o = > tre(u)I, + 
2we(u) where à and p are the Lamé coefficients, I, denotes 
the identity matrix in IR?%*, and tr(t) := La Tu for 
T= (tj) E R2*2_ We assume that the Lamé coefficients 
are constant on Qg- 

In the following, we discretize the problem (169) by 
coupled dual-mixed FE and BE. Below, this allows for 
representing u on , by pure boundary integral operators 
acting on T. 

To derive an appropriate variational formulation, we 
define the tensor spaces 


erage am fo [ BY Ss 


T% Te 
Ty EL*(Qp), ij = 1,2] 
and 
H(div; Qp) := [t € [L (2p) 2; dive € [L7(2,)7} 
with norms 


5 1/2 
2 2 
le iliz = (x Wty sen 


ij=1 


; 2 1/2 
It lhaiv;a7) = (lezen + lidiy tlle Í 
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and inner products 


f s:tdx and s:tdx+ divs - div t dx 
Qe Qe Qr 
Here, the symbol ‘:’ denotes the product 
2 
s:it:= $ syu forste (DA (079) 7 
ij=1 


and 


aa | HC a T ies ‘| 
E pa T22) Ty Ta 


We also define 
H, div; 2p) := {t € H(div; 27); (T - vir, = 9} 


Now, we follow Brink, Carstensen and Stein (1996) in 
deriving a weak formulation of the problem (169). On Q,, 
we consider u and o as independent unknowns. Testing 
(169b) with t € Hy (div; 2p), this gives 


f coa f qt : e(u) dx 
Qe Qr 


Integrating by parts and using u = g on lp, we obtain 


+] v yde- (p, t:v)r 
Qe 


= (g,T- viro Vt € Ho (div; 2p) (170) 
where y := 1/2(Vu — (Vu)") and 
$ = ulp € [EYP (171) 


are introduced as further unknowns. Here, (., -) stands for 
the duality pairing between [HYZ(T)]? and (HOP), 
whereas (-,-)r,, denotes the duality between LHT p)? 
and its dual [H~1/2(I'p)]?. We note that y represents 
rotations and lies in the space 


Hy = [t € [L2 P; t+ 07 =O) 


0 Gh 2 
-e Jeren 


Further, testing (169a), we find that there holds 


Í vedivodx=—| fevde Yve [LQ (172) 
Qe 


Qe 


The problem of linear elasticity within Qg is dealt with 
by boundary integral operators on P. Denoting by V, 
K, K’, and W the integral operators of the single-layer, 
double-layer, adjoint of the double layer and hypersingular 
potentials respectively, and using the well-known jump 
conditions (cf. e.g. Hsiao, Stephan and Wendland, 1991), 
we obtain 


26 = 0+K6-Vo-v) onf (173) 
and 
Wo+04+K’)-v)=0 onr (174) 

Here, b = ulp (cf. (171)) and I is the identity operator 
on the corresponding spaces. 

Eventually, we substitute (173) into (170) and we test 
(174) with the functions p € [H'/*(C)}’. Then, collecting 
(170), (172), and (174) and requiring the symmetry of o 
weakly by 


f o:8dx=0 V8E Hy 
Rr 


we arrive at the following variational formulation of 
the boundary value problem (169): find (0,6,u,y) € 
H, (div; 2) x [H WD)? x [L (R p)]? x Hy such that 


a(o, o; T, Y) +b(T; u, Y) =2(g, T: Viro 


ERE f f-vdx (175) 
Rr 


for all (t,¥,v,8) € (div; Qp) x [HTP x 
[L7(Q,)F x Ho. Here, 


alo, d; t, Y) = 2ap(o,t) +43,%5%) 
— (y, Wo)p — (h, T+ K+ v))p 


with 
ap(0, t) = fe 2 Cle dx 
ag (0, b; t) = (t+ v», VO -¥))p — (tv, A+ B)O)p 
and 


vo: ,8) =2 f v-divodx +2 o : 8dx 
d QF Rr 


Brink, Carstensen and Stein (1996) proved unique solvabil- 


ity of this weak formulation. 
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Theorem 20. For every fe {L?(Q,)P and ge [H 
T Di the saddle-point problem (175) has a unique solu- 
tion satisfying 


Io Naver) + Olle + llr 
+ lvlluenpe < C{Ifllzeenr + liglere} 


with a positive constant C, which is independent of f and g. 


Let Qp =U{T; T € Tp} be a partitioning of Qe. The 
elements T are triangles or quadrilaterals. For T Æ T’, 
TNT" is either empty or a common vertex or edge. 

To define an implicit residual-error estimator, we follow 
the strategy from Section 2.2.3 and use an elliptic, con- 
tinuous, symmetric bilinear form 4(-, -) on Hy(div; 2p) x 
[H'(L)]%. Here, (H#/2(I)]? is the quotient space [H'/? 
(T)]?/ker(eļp) that eliminates the rigid body motions 
ker(eļp). For simplicity, we take the bilinear forms 


div ø - div t dx 
Qe 


on Hy (div; 2+) x Hy(div; 2) 


âro o= f o:tde+ 
Qe 


and 
ag (, Y) := (4, Wo), on [HAP x [H 
and define 


âl, $; t, Y) := apo, t) + 4g(, Y) (176) 


Note that 4 is an inner product on Hy(div; Qp) x [H12 
(I°)]2. The restriction of âp to an element T € T, is denoted 
by ap r, that is, 


âplo, t) = J. apply, tly) for all ø, t € Hy (div; 2) 


Tet, 


Now, let us assume that we have an approximate solution 
(4s Or Uys Ya) € Ho (div; Qp) x [HPP x [L (2p) 
x Hy to (175). In practice, this will be a coupled finite 
element/BE solution obtained by restricting (175) to some 
discrete subspaces. 

For the error estimator, the following local problems are 
defined: for T € T,, find 


oz € Hy (div, T) := {t € [L2(7) 
div t € [L7(T)P, (t+ v)larnry = 0} 


such that 


Âr r(0r, T1) = —Fp(t) forall t e Hy(div; T) 
(177) 


where 


+ 


Fy(t) = Gp 7(64, 1) + agr O43 T) 


torm yy) —2 f R(t -n)ds 
ar\r 
(n is the outward unit normal of T) with 


Ap (Oy, T) = j> : C10, dx 
ap TOn br: t) = (t “vy, Vio, g varar 
—(t-v, A+ Koy) azar 


br(t; up, Yh) =2 f uy diveds +2 fe: y,ax 
T T 


Here, X € [H'/*(Upeq, 97 \T)P is arbitrary on the ele- 
ment boundaries interior to Q, and on Ty. We require that 
=g on Ip. Note that if Mor ryury = Ularyryury for 
T € 7, (which in particular implies that X| p, = g), then 
the solution op of (177) converges to 0 in Hy(div; T) 
if (Oas On: Up, Yh) converges to (0, ġ,u, y). The local 
solution op can be considered as a projection of the 
error (© —o,, $ — b, U — u,, Y — Yp) onto the local space 
Ho (div; T). We expect that a good approximation of ^ to 
u on interior element edges improves the efficiency of our 
error estimator. 

In Gatica, Heuer and Stephan (2001), we prove the 
following a posteriori error estimate based on the local 
problems (177) yielding reliability. The proof is a modi- 
fication of the proof of Theorem 7. 


Theorem 21. Let N€ [H Uren 8T \D)? with Ary 
= g. Further, define for any T € T, the function oy € Hy 
(div; T) by the local problem (177). Then, there holds the a 
posteriori error estimate 


lo —o, lhaaiv:2¢) + [lo — >, legy 


+ lu — ulige + lly — Yall LR) P22 


1/2 
< cf So ap rp, or) + z? 


Tet, 
+ If + div o 207 + lon — oT aan} 
where C depends on the norm of 
Ww): (ary? > LAP ry? /kerW 


and on the inf—sup constant B of the bilinear form of the 
saddle-point problem (175), but is independent of h. Here, 
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Ry is the norm of the residual for the boundary integral 
equation (174), that is, 


Rp := ||Wo, +d +K)(e, - W) Mle 


and the kernel of W consists of the rigid body motions, that 
is, [HYT] /kerW = [a/2(ry]2. 


Note that the above error estimate does not make use 
of any special FE or BE spaces. Here, the residual term 
is given in a negative-order Sobolev norm. In practical 
applications, in which a certain BE subspace is used, this 
norm can be estimated by weighted local L?-norms. 


7 CONCLUDING REMARKS 


This chapter gives an overview of boundary element and 
finite element coupling procedures for elliptic interface 
problems. Special emphasis is given to the derivation of 
a posteriori error estimates. Owing to the limitation of the 
chapter length, we have omitted to describe corresponding 
adaptive algorithms that can be found in the given literature. 
The FE/BE coupling method is still a rapidly developing 
field, and this chapter can only describe some fundamental 
concepts. Further issues, which are not discussed above, 
are, for example, the following topics. 

In BEM implementations for engineering applications, 
the integral equation is often discretized by collocation (see 
Brebbia, Telles and Wrobel, 1984). The coupling of FEM 
and collocation BEM was analyzed by Wendland (1988), 
and Galerkin BEM by Wendland (1990). In Wendland 
(1988), the mesh refinement requires the condition k = 
o(h), with k(h) denoting the sizes of the BE (FE) mesh. 
In Wendland (1990) and in Brink and Stephan (1996), this 
condition is weakened to k < $ - h, $ € JR, and convergence 
is shown on the basis of the ideas from Brezzi and Johnson 
(1979). To establish convergence, the essential observation 
is that, given an FEM mesh, it suffices to solve the boundary 
integral equations accurately enough. In Brink and Stephan 
(1996), we show that in the energy norm the coupled 
method converges with optimal order. 

A hybrid coupled FE/BE method for linear elasticity 
problems is presented in Hsiao (1990) and Hsiao, Schnack 
and Wendland (1999, 2000). In this hybrid method, in 
addition to traditional FE, Trefftz elements are considered. 
These are modeled with boundary potentials supported by 
the individual element boundaries, the so-called macroele- 
ments. Collocation as well as Galerkin methods are used 
for the coupling between the FE and macroelements. The 
coupling is realized via mortar elements on the coupling 
boundary. Here, different discretizations on Q, and Q, are 
coupled via weak formulations and the use of mortar or 


Lagrange spaces on the coupling boundaries (cf. Steinbach, 
2003). 

For a parabolic—elliptic interface problem with the heat 
equation in the interior domain and the Laplace equation in 
the exterior domain, modeling two-dimensional eddy cur- 
rents in electrodynamics, Costabel, Ervin, Stephan (1990) 
introduce a full discretization of the problem by symmetric 
coupling of FE and BE. For the discretization in time, the 
Crank—Nicolson method is proposed there. In Mund and 
Stephan (1997), we use the discontinuous Galerkin method 
(with piecewise linear test and trial functions), which allows 
space and time steps to be variable in time. On the basis of 
an a posteriori error estimate of residual type, we present a 
reliable adaptive algorithm for choosing local mesh sizes in 
space and time. The linear systems of equations obtained 
by the above mentioned discretization in space and time 
are symmetric and indefinite; they are solved by the HMCR 
method (see Chandra, Eisenstat and Schultz, 1977), a stable 
version of MINRES. 

Symmetric and nonsymmetric coupling FE/BE proce- 
dures for solving the heterogeneous Maxwell equations in 
IR°\Q, with a Leontovich boundary condition on I’, are 
given in Ammari and Nedelec (2000). In this paper, the 
authors consider the time-harmonic electromagnetic scat- 
tering by a bounded dielectric material & surrounding a 
lossy highly conductive body Q,. 

In Teltscher, Maischak and Stephan (2003), we present 
a symmetric FE/BE coupling method for the time-harmonic 
eddy-current problem (in electro magnetics) for low 
frequencies, 

The symmetric coupling method (see also Kuhn and 
Steinbach (2002) and Hiptmair (2002)) is based on a weak 
formulation of the vector Helmholtz equation for the elec- 
trical field u in the polyhedral domain Q and uses a weak 
formulation of the integral equations for u|,, the trace of u 
on I, and for i, the twisted tangential trace of the magnetic 
field on I’, the boundary of the conductor. The resulting sys- 
tem is strongly elliptic and symmetric and yields quasiop- 
timal convergence for any conforming Galerkin coupling 
method, We present a posteriori error estimates of residual 
and hierarchical type. Numerical results with lowest-order 
Nédélec elements on hexaliedrals as FE and lowest-order 
RT elements (with vanishing surface divergence) as BE 
underline our theory. 

In Brink and Stephan (2001), the FE/BE coupling is 
analyzed, in which, in the FEM domain, we assume an 
incompressible elastic material and use a Stokes-type mixed 
FEM; linear elasticity is considered in the BEM domain. 
For rubberlike materials, the incompressibility has to be 
taken into account, which requires mixed FE. The stress 
(which is often the physical quantity of maximum interest) 
cannot be determined merely from the displacement, if the 
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_ material is incompressible. Additional unknowns have to 


be introduced. In Brink and Stephan (2001), we employ a 
primal-mixed finite element method with the pressure as 
the secondary unknown. 

Alternatively, one may use dual-mixed methods, in 
which, in elasticity, the stress tensor is the primary 
unknown. A coupling of BEM and dual-mixcd FEM was 
proposed by Brink, Carstensen and Stein (1996), in which 


© linear elasticity was assumed in the FEM domain. 


For problems with nonlinearities, a so-called dual—dual 
formulation can be applied, since it avoids to invert the 
elasticity tensor C directly (see Gatica and Heuer, 2000). 
An a posteriori error estimate for the pure FEM (no cou- 
pling with BEM), based on the dual—dual formulation, is 
given in Barrientos, Gatica and Stephan (2002). Suitable 
preconditioned solvers for the dual—dual coupling can be 
found in Gatica and Heuer (2002). 

An alternative procedure to the FE/BE coupling for 
exterior nonlinear—linear transmission problems consists 
of employing DtN mappings instead of BEM (cf. Givoli, 
1992). This means that one first introduces a sufficiently 
large circle T (in IR2), or a sphere (in IRÎ), such that the 
linear domain is divided into a bounded annular region 
and an unbounded one. Then, one derives an explicit 
formula for the Neumann data on T in terms of the 
Dirichlet data on the same curve, which is known as 
the DtN mapping. This has been done for several ellip- 
tic operators, including the Lamé system for elasticity, 
and acoustic problems, using Fourier-type series devel- 
opments (see MacCamy and Marin, 1980; Gatica, 1997; 
Barrenechea, Gatica and Hsiao, 1998; Barrientos, Gat- 
ica and Maischak, 2002; and Gatica, Gatica and Stephan, 
2003). 
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1 INTRODUCTION 


The numerical simulation of multidimensional problems 
in fluid dynamics and nonlinear solid mechanics often 
requires coping with strong distortions of the continuum 
under consideration while allowing for a clear delin- 
eation of free surfaces and fluid—fiuid, solid—solid, or 
fluid—structure interfaces. A fundamentally important con- 
sideration when developing a computer code for simulating 
problems in this class is the choice of an appropriate 
kinematical description of the continuum. In fact, such a 
choice determines the relationship between the deform- 
ing continuum and the finite grid or mesh of computing 
zones, and thus conditions the ability of the numerical 
method to deal with large distortions and provide an accu- 
rate resolution of material interfaces and mobile bound- 
aries. 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


The algorithms of continuum mechanics usually make 
use of two classical descriptions of motion: the Lagrangian 
description and the Eulerian description; see, for instance, 
Malvern (1969). The arbitrary Lagrangian—Enulerian 
(ALE, in short) description, which is the subject of the 
present chapter, was developed in an attempt to combine 
the advantages of the above classical kinematical descrip- 
tions, while minimizing their respective drawbacks as far 
as possible. 

Lagrangian algorithms, in which each individual node of 
the computational mesh follows the associated material par- 
ticle during motion (see Figure 1), are mainly used in struc- 
tural mechanics. The Lagrangian description allows an easy 
tracking of free surfaces and interfaces between different 
materials. It also facilitates the treatment of materials with 
history-dependent constitutive relations. Its weakness is its 
inability to follow large distortions of the computational 
domain without recourse to frequent remeshing operations. 

Eulerian algorithms are widely used in fluid dynamics. 
Here, as shown in Figure 1, the computational mesh is 
fixed and the continuum moves with respect to the grid. In 
the Eulerian description, large distortions in the continuum 
motion can be handled with relative ease, but generally at 
the expense of precise interface definition and the resolution 
of flow details. 

Because of the shortcomings of purely Lagrangian and 
purely Eulerian descriptions, a technique has been devel- 
oped that succeeds, to a certain extent, in combining 
the best features of both the Lagrangian and the Eule- 
rian approaches, Such a technique is known as the arbi- 
trary Lagrangian—Eulerian (ALE) description. In the ALE 
description, the nodes of the computational mesh may be 
moved with the continuum in normal Lagrangian fashion, 
or be held fixed in Eulerian manner, or, as suggested in 


t ALE description 


A Material point —— Particle motion 


@Nede mam Mesh motion 


Figure 1. One-dimensional example of Lagrangian, Eulerian and 
ALE mesh and particle motion. 


Figure 1, be moved in some arbitrarily specified way to 
give a continuous rezoning capability. Because of this free- 
dom in moving the computational mesh offered by the ALE 
description, greater distortions of the continuum can be han- 
dled than would be allowed by a purely Lagrangian method, 
with more resolution than that afforded by a purely Eulerian 
approach. The simple example in Figure 2 illustrates the 
ability of the ALE description to accommodate significant 
distortions of the computational mesh, while preserving the 
clear delineation of interfaces typical of a purely Lagrangian 
approach, A coarse finite element mesh is used to model the 
detonation of an explosive charge in an extremely strong 
cylindrical vessel partially filled with water. A compari- 
son is made of the mesh configurations at time t = 1.0 ms 
obtained respectively, with the ALE description (with auto- 
matic continuous rezoning) and with a purely Lagrangian 
mesh description. As further evidenced by the details of 
the charge—water interface, the Lagrangian approach suf- 
fers from a severe degradation of the computational mesh, 
in contrast with the ability of the ALE approach to main- 
tain quite a regular mesh configuration of the charge--water 
interface. 

The aim of the present chapter is to provide an in- 
depth survey of ALE methods, including both conceptual 
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Figure 2. Lagrangian versus ALE descriptions: (a) initial FE 
mesh; (b) ALE mesh at t= 1 ms; (c) Lagrangian mesh at 
t = 1 ms; (d) details of interface in Lagrangian description. 


aspects and numerical implementation details in view of 
the applications in large deformation material response, 
fiuid dynamics, nonlinear solid mechanics, and coupled 
fluid—structure problems. The chapter is organized as fol- 
lows. The next section introduces the ALE kinematical 
description as a generalization of the classical Lagrangian 
and Eulerian descriptions of motion. Such generalization 
rests upon the introduction of a so-called referential domain 
and on the mapping between the referential domain and 
the classical, material, and spatial domains. Then, the fun- 
damental ALE equation is introduced, which provides a 
relationship between material time derivative and referen- 
tial time derivative. On this basis, the ALE form of the basic 
conservation equations for mass, momentum, and energy is 
established. Computational aspects of the ALE algorithms 
are then addressed, This includes mesh-update procedures 
in finite element analysis, the combination of ALE and 
mesh-refinement procedures, as well as the use of ALE 
in connection with mesh-free methods. The chapter closes 
with a discussion of problems commonly encountered in 
the computer implementation of ALE algorithms in fluid 
dynamics, solid mechanics, and coupled problems describ- 
ing filuid—structure interaction. 


2 DESCRIPTIONS OF MOTION 


Since the ALE description of motion is a generalization 
of the Lagrangian and Eulerian descriptions, we start with 
a brief reminder of these classical descriptions of motion. 
We closely follow the presentation by Donea and Huerta 
(2003). 


2.1 Lagrangian and Eulerian viewpoints 


Two domains are commonly used in continuum mechanics: 
the material domain Ry C R™, with n, spatial dimensions, 
made up of material particles X, and the spatial domain R,, 
consisting of spatial points x. 

The Lagrangian viewpoint consists of following the 
material particles of the continuum in their motion. To this 
end, one introduces, as suggested in Figure 3, a compu- 
tational grid, which follows the continuum in its motion, 
the grid nodes being permanently connected to the same 
material points. The material coordinates, X, allow us to 
identify the reference configuration, Ry. The motion of the 
material points relates the material coordinates, X, to the 
spatial ones, x. It is defined by an application 9 such that 


Q: Ry x [fo tinal! —> Re X Vo» tfinall 
adm X, =w D 


which allows us to link X and x in time by the law of 
motion, namely 


x=x(X,t), faf (2) 


which explicitly states the particular nature of 9: first, the 
spatial coordinates x depend both on the material particle, 
X, and time ż, and, second, physical time is measured by the 
same variable t in both material and spatial domains. For 
every fixed instant t, the mapping ¢ defines a configuration 
in the spatial domain. It is convenient to employ a matrix 
representation for the gradient of 9, 


Reference configuration 


Current configuration 


Figure 3. Lagrangian description of motion. 


Arbitrary Lagrangian—Eulerian Methods 415 


ax) tet 4 a 


where 0° is a null row-vector and the material velocity v 
is 


ax 
v(X,t)= Sel (4) 
with | meaning “holding the material coordinate X fixed”. 

Obviously, the one-to-one mapping » must verify 
det(0x/9X) > 0 (nonzero to impose a one-to-one corre- 
spondence and positive to avoid orientation change of the 
reference axes) at each point X and instant ¢ > fọ. This 
allows us to keep track of the history of motion and, by 
the inverse transformation (X, t) = @7'(x, t), to identify, 
at any instant, the initial position of the material particle 
occupying position x at time ¢. 

Since the material points coincide with the same grid 
points during the whole motion, there are no convective 
effects in Lagrangian calculations: the material derivative 
reduces to a simple time derivative. The fact that each 
finite element of a Lagrangian mesh always contains the 
same material particles represents a significant advantage 
from the computational viewpoint, especially in problems 
involving materials with history-dependent behavior. This 
aspect is discussed in detail by Bonet and Wood (1997). 
However, when large material deformations do occur, for 
instance vortices in fluids, Lagrangian algorithms undergo a 
loss in accuracy, and may even be unable to conclude a cal- 
culation, due to excessive distortions of the computational 
mesh linked to the material. 

The difficulties caused by an excessive distortion of the 
finite element grid are overcome in the Eulerian formu- 
lation. The basic idea in the Eulerian formulation, which 
is very popular in fluid mechanics, consists in examining, 
as time evolves, the physical quantities associated with the 
fluid particles passing through a fixed region of space. Tn an 
Eulerian description, the finite element mesh is thus fixed 
and the continuum moves and deforms with respect to the 
computational grid. The conservation equations are formu- 
lated in terms of the spatial coordinates x and the time ¢. 
Therefore, the Eulerian description of motion only involves 
variables and functions having an instantaneous significance 
in a fixed region of space. The material velocity v at a 
given mesh node corresponds to the velocity of the material 
point coincident at the considered time t with the consid- 
ered node. The velocity v is consequently expressed with 
respect to the fixed-element mesh without any reference to 
the initial configuration of the continuum and the material 
coordinates X: v = v(x, t). 
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Since the Eulerian formulation dissociates the mesh 
nodes from the material particles, convective effects appear 
because of the relative motion between the deforming mate- 
rial and the computational grid. Eulerian algorithms present 
numerical difficulties due to the nonsymmetric character 
of convection operators, but permit an easy treatment of 
complex material motion. By contrast with the Lagrangian 
description, serious difficulties are now found in following 
deforming material interfaces and mobile boundaries. 


2.2. ALE kinematical description 


The above reminder of the classical Lagrangian and Eule- 
rian descriptions has highlighted the advantages and draw- 
backs of each individual formulation. It has also shown 
the potential interest in a generalized description capable 
of combining at best the interesting aspects of the classi- 
cal mesh descriptions while minimizing their drawbacks as 
far as possible. Such a generalized description is termed 
arbitrary Lagrangian—Eulerian (ALE) description. ALE 
methods were first proposed in the finite difference and 
finite volume context. Original developments were made, 
among others, by Noh (1964), Franck and Lazarus (1964), 
Trulio (1966), and Hirt et al. (1974); this last contribution 
has been reprinted in 1997. The method was subsequently 
adopted in the finite element context and early applica- 
tions are to be found in the work of Donea et al. (1977), 
Belytschko et al. (1978), Belytschko and Kennedy (1978), 
and Hughes et al. (1981). 

In the ALE description of motion, neither the material 
Configuration Ry nor the spatial configuration R, is taken 
as the reference. Thus, a third domain is needed: the ref- 
erential configuration R, where reference coordinates X 
are introduced to identify the grid points. Figure 4 shows 


ee Ry 
D D 


D 


R, 


Figure 4. The motion of the ALE computational mesh is inde- 
pendent of the material motion. 


these domains and the one-to-one transformations relating 
the configurations. The referential domain Ry is mapped 
into the material and spatial domains by W and ® respec- 
tively. The particle motion @ may then be expressed as 
9 = @oW-!) clearly showing that, of course, the three 
mappings Y, ®, and ọ are not independent. 

The mapping of ® from the referential domain to the 
spatial domain, which can be understood as the motion of 
the grid points in the spatial domain, is represented by 


®: Ry x [fo fanail — R, x [fo tinal 
A — ba D= x,t) 6) 


and its gradient is 
Ta © 


where now, the mesh velocity 


R _ Ox 7 
D, t) = a lx (7) 
is involved. Note that both the material and the mesh move 
with respect to the laboratory. Thus, the corresponding 
material and mesh velocities have been defined by deriving 
the equations of material motion and mesh motion respec- 
tively with respect to time (see equations 4 and 7). 

Finally, regarding W, it is convenient to represent directly 
its inverse W—!, 


WR, x [to tinal ——> Ry x [fos final 
(X,t) — YX, N=) 8) 


and its gradient is 


ax 
awe —= w 
a | on 0) 
(X, t) o 4 
where the velocity w is defined as 
ax 
w= rv x (10) 


and can be interpreted as the particle velocity in the ref- 
erential domain, since it measures the time variation of 
the referential coordinate x holding the material particle 
X fixed. The relation between velocities v, 6, and w can 
be obtained by differentiating ọ = do w~!, 


a9 


aa, pE” 


= -1 awe} 

ne a (W-1(X, 1) KD rae 
Od aw 

= EC ha t) EE A (11) 


or, in matrix format: 
ax Ox -. aX 
ax "| {ox "| ax ” (12) 
oO if\or 1 eo 1 


which yields, after block multiplication, 
x, 1088 
v=v0+—-w (13) 
ax 


This equation may be rewritten as 


a. 
c:=v—d=—-w (14) 
ax 


thus defining the convective velocity c, that is, the relative 
velocity between the material and the mesh. 

The convective velocity ¢ (see equation 14), should not 
be confused with w (see equation 10). As stated before, w 
is the particle velocity as seen from the referential domain 
Ry, whereas c is the particle velocity relative to the mesh as 
seen from the spatial domain R, (both v and @ are variations 
of coordinate x). In fact, equation (14) implies that c = w if 
and only if dx /@y = I (where J is the identity tensor), that 
is, when the mesh motion is purely translational, without 
rotations or deformations of any kind. 

After the fundamentals on ALE kinematics have been 
presented, it should be remarked that both Lagrangian or 
Eulerian formulations may be obtained as particular cases. 
With the choice W = J, equation (3) reduces to X =x% 


` and a Lagrangian description results: the material and 


mesh velocities, equations (4) and (7), coincide, and the 
convective velocity ¢ (see equation 14), is null (there are 
no convective terms in the conservation laws). If, on the 
other hand, ® = J, equation (2) simplifies into x = x, thus 
implying a Eulerian description: a null mesh velocity is 
obtained from equation (7) and the convective velocity e is 
simply identical to the material velocity v. 

In the ALE formulation, the freedom of moving the mesh 
is very attractive. It helps to combine the respective advan- 
tages of the Lagrangian and Eulerian formulations. This 
could, however, be overshadowed by the burden of speci- 
fying grid velocities well suited to the particular problem 
under consideration. As a consequence, the practical imple- 
mentation of the ALE description requires that an automatic 
mesh-displacement prescription algorithm be supplied. 


3 THE FUNDAMENTAL ALE EQUATION 


In order to express the conservation laws for mass, momen- 
tum, and energy in an ALE framework, a relation between 
material (or total) time derivative, which is inherent in con- 
Servation laws, and referential time derivative is needed. 
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3.1 Material, spatial, and referential time 
derivatives 


In order to relate the time derivative in the material, spatial, 
and referential domains, let a scalar physical quantity be 
described by f(x, t), f*(x, t), and f**(X, t) in the spatial, 
referential, and material domains respectively. Stars are 
employed to emphasize that the functional forms are, in 
general, different, 

Since the particle motion ọ is a mapping, the spatial 
description f (x,t), and the material description f**(X, t) 
of the physical quantity can be related as 


ff" (Xt) = fX or f= fo@ (15) 


The gradient of this expression can be easily computed as 


ial af 39x 16 
aa = tan aan) 6 ) 


which is amenable to the matrix form 


ax Pe 
af™ are = of af ax (17) 
aX at ax ðt oe 1 


which renders, after block multiplication, a first expression, 
which is obvious, that is, (@f*/0X) = (3f /3x)(8x/3 X); 
however, the second one is more interesting: 


BE oe (18) 


əs ð ðx 


Note that this is the well-known equation that relates the 
material and the spatial time derivatives. Dropping the stars 
to ease the notation, this relation is finally cast as 

af = +v- Vf or S L ee 

ðt Ix Ot Ix (19) 
which can be interpreted in the usual way: the variation of 
a physical quantity for a given particle X is the local varia- 
tion plus a convective term taking into account the relative 
motion between the material and spatial (laboratory) sys- 
tems. Moreover, in order not to overload the rest of the text 
with notation, except for the specific sections, the material 
time derivative is denoted as 


2 S| (20) 
dt Ot Ix 

and the spatial time derivative as 
hd ae (21) 
ðt ðt ix 


The relation between material and spatial time derivatives 
is now extended to include the referential time derivative. 
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With the help of mapping W, the transformation from 
the referential description f*(x,t) of the scalar physical 
quantity to the material description f **(X, t) can be written 
as 


aad ee f* o yl (22) 


and its gradient can be easily computed as 


af™ af* awe} 
—(X p= =< A.t 23 
rea ,t) rT ie ) aw ) (3) 
or, in matrix form 
Ox 

af™ of" = af* aft ax 4 (24) 

aX ât ax at oT 1 
which renders, after block multiplication, 

af a oe an 


ðt at OX 


Note that this equation relates the material and the ref- 
erential time derivatives. However, it also requires the 
evaluation of the gradient of the considered quantity in 
the referential domain. This can be done, but in compu- 
tational mechanics it is usually easier to work in the spatial 
(or material) domain. Moreover, in fluids, constitutive rela- 
tions are naturally expressed in the spatial configuration and 
the Cauchy stress tensor, which will be introduced next, is 
the natural measure for stresses. Thus, using the definition 
of w given in equation (14), the previous equation may be 
rearranged into 


a ae 


2 
at ðt = ax 6) 


The fundamental ALE relation between material time 
derivatives, referential time derivatives, and spatial gradient 
is finally cast as (stars dropped) 


af; af; af _ af 
A = bly t az are i as (7) 


and shows that the time derivative of the physical quantity 
f for a given particle X, that is, its material derivative, 
js its local derivative (with the reference coordinate x held 
fixed) plus a convective term taking into account the relative 
velocity c between the material and the reference system. 
This equation is equivalent to equation (19) but in the ALE 
formulation, that is, when (x, £) is the reference. 


3.2 Time derivative of integrals over moving 
volumes 


To establish the integral form of the basic conservation laws 
for mass, momentum, and energy, we also need to consider 
the rate of change of integrals of scalar and vector functions 
over a moving volume occupied by fiuid. 

Consider thus a material volume V, bounded by a smooth 
closed surface S, whose points at time £ move with the 
material velocity v = v(x,t) where x € S, A material 
volume is a volume that permanently contains the same par- 
ticles of the continuum under consideration. The material 
time derivative of the integral of a scalar function f(x, t) 
(note that f is defined in the spatial domain) over the 
time-varying material volume V, is given by the follow- 
ing well-known expression, often referred to as Reynolds 
transport theorem (see, for instance, Belytschko et al., 2000 
for a detailed proof): 


d T af (x, t) 
gf renwsf “re av 


$ f(x,t)u-ndS (28) 
Sc=S; 


which holds for smooth functions f(x,). The volume 
integral in the right-hand side is defined over a control 
volume V, (fixed in space), which coincides with the 
moving material volume V, at the considered instant, t, 
in time. Similarly, the fixed control surface S, coincides at 
time ¢ with the closed surface S, bounding the material 
volume V,. In the surface integral, n denotes the unit 
outward normal to the surface S, at time t, and v is the 
material velocity of points of the boundary S,. The first term 
in the right-hand side of expression (28) is the local time 
derivative of the volume integral. The boundary integral 
represents the flux of the scalar quantity f across the fixed 
boundary of the control volume V, = V,. 
Noting that 


[ rænvnas= f Ve (fv)dV (29) 
Se Ve 


one obtains the alternative form of Reynolds transport 
theorem: 


d af (x, t) 
f rena = ie (a +V. o) dv 
(30) 
Similar forms hold for the material derivative of the volume 
integral of a vector quantity. Analogous formulae can be 
developed in the ALE context, that is, with a referential 
time derivative. In this case, however, the characterizing 


velocity is no longer the material velocity v, but the grid 
velocity 3. 


4 ALE FORM OF CONSERVATION 
EQUATIONS 


To serve as an introduction to the discussion of ALE finite 
element and finite volume models, we establish in this 
section the differential and integral forms of the conser- 
vation equations for mass, momentum, and energy. 


4.1 Differential forms 


The ALE differential form of the conservation equations 
for mass, momentum, and energy are readily obtained from 
the corresponding well-known Eulerian forms 


z dp 9p = 
Mass: ra a |, to Ye= 0-2 
dv av 
M tum: — = — . =V. 
omentum. °F (| +0 v») V-o+pb 
dE OE 
E ; — = — 5 
nergy P (s | +» vz) 
= V- (0o -v) +v- pb 81) 


where p is the mass density, v is the material velocity 
vector, o denotes the Cauchy stress tensor, b is the specific 
body force vector, and E is the specific total energy. Only 
mechanical energies are considered in the above form of 
the energy equation. Note that the stress term in the same 
equation can be rewritten in the form 


ð do;; dv, 

Wa (ora) g = i by ge 

t 1 
=(V-o)-v+o:Vv (32) 


where Vv is the spatial velocity gradient. 
Also frequently used is the balance equation for the 
internal energy 


de 


== HE Ve) =0:V5 33 
a era e a : (33) 


where e is the specific internal energy and Vŝv denotes the 
stretching (or strain rate) tensor, the symmetric part of the 
velocity gradient Vv; that is, Vv = (1/2)(Vo + Viv). 
All one has to do to obtain the ALE form of the above 
Conservation equations is to replace in the various con- 
vective terms, the material velocity v with the convective 
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velocity e = v — 9, The result is 
f : ap 
Mass: aly teva Oe 


ðv 
M tum: — z aN 
omentum. p e S + (e vr) V-ot+pb 


3E 
Total energy: o( =| +e- VE)=V- (0-0) +v: 
ðt IX 
ðe s 
Internal energy: p gelyte =0:V”v. (34) 


It is important to note that the right-hand side of equation 
(34) is written in classical Eulerian (spatial) form, while 
the arbitrary motion of the computational mesh is only 
reflected in the left-hand side. The origin of equations 
(34) and their similarity with the Eulerian equations (31) 
have induced some authors to name this method the quasi- 
Eulerian description; šee, for instance, Belytschko et al. 
(1980). 


Remark (Material acceleration) Mesh acceleration plays 
no role in the ALE formulation, so, only the material 
acceleration a, the material derivative of velocity v, is 
needed, which is expressed in the Lagrangian, Eulerian, 
and ALE formulation respectively as 


a= A 35 
~ at lx ae 
ðv ðv 
Oe T (35b) 
a 4 RA B5 
~ atly Ox c) 


Note that the ALE expression of acceleration (35c) is 
simply a particularization of the fundamental relation (27), 
taking the material velocity v as the physical quantity f. 
The first term in the right-hand side of relationships (35b) 
and (35c) represents the local acceleration, the second term 
being the convective acceleration. 


4.2 Integral forms 
The starting point for deriving the ALE integral form of 
the conservation equations is Reynolds transport theorem 


(28) applied to an arbitrary volume V, whose boundary 
S, = 8V, moves with the mesh velocity 3: 


a _ f 9f@,H 
il, |, feo = f, > | av 


+ f fæD inas (36) 
S, 
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where, in this case, we have explicitly indicated that the 
time derivative in the first term of the right-hand side 
is a spatial time derivative, as in expression (28). We 
then successively replace the scalar f (x,t) by the fluid 
density p, momentum pv, and specific total energy pE. 
Similarly, the spatial time derivative 3f/ðt is substituted 
with expressions (31) for the mass, momentum, and energy 
equation. The end result is the following set of ALE integral 
forms: 


a 
aly |, e+ f ee-nas =0 


ð 
=| f oaf pve-nds = f (V-o + 9b)4V 
arlx Jy, S v, 


a 
= EdV Ec-ndS 
sh f° +f c-n 


= f @-0b+¥- Œ) av (37) 
v, 


Note that the integral forms for the Lagrangian and Eulerian 
mesh descriptions are contained in the above ALE forms. 
The Lagrangian description corresponds to selecting î = 
v (c =0), while the Eulerian description corresponds to 
selecting 6 = 0 (c = v). 

The ALE differential and integral forms of the conserva- 
tion equations derived in the present section will be used 
as a basis for the spatial discretization of problems in fluid 
dynamics and solid mechanics. 


5 MESH-UPDATE PROCEDURES 


The majority of modern ALE computer codes are based 
on either finite volume or finite element spatial discretiza- 
tions, the former being popular in the fluid mechanics area, 
the latter being generally preferred in solid and structural 
mechanics. Note, however, that the ALE methodology is 
also used in connection with so-called mesh-free methods 
(see, for instance, Ponthot and Belytschko, 1998 for an 
application of the element-free Galerkin method to dynamic 
fracture problems). In the remainder of this chapter, refer- 
ence will mainly be made to spatial discretizations produced 
by the finite clement method. 

As already seen, one of the main advantages of the ALE 
formulation is that it represents a very versatile combina- 
tion of the classical Lagrangian and Eulerian descriptions. 
However, the computer implementation of the ALE tech- 
nique requires the formulation of a mesh-update procedure 
that assigns mesh-node velocities or displacements at each 
Station (time step, or load step) of a calculation. The mesh- 
update strategy can, in principle, be chosen by the user. 


However, the remesh algorithm strongly influences the suc- 
cess of the ALE technique and may represent a big burden 
on the user if it is not rendered autornatic. 

Two basic mesh-update strategies may be identified. On 
one hand, the geometrical concept of mesh regularization 
can be exploited to keep the computational mesh as regular 
as possible and to avoid mesh entanglement during the 
calculation. On the other hand, if the ALE approach is used 
as a mesh-adaptation technique, for instance, to concentrate 
elements in zones of steep solution gradient, a suitable 
indication of the error is required as a basic input to the 
remesh algorithm. 


5.1 Mesh regularization 


The objective of mesh regularization is of a geometrical 
nature. It consists in keeping the computational mesh as 
regular as possible during the whole calculation, thereby 
avoiding excessive distortions and squeezing of the com- 
puting zones and preventing mesh entanglement. Of course, 
this procedure decreases the numerical errors due to mesh 
distortion. 

Mesh regularization requires that updated nodal coordi- 
nates be specified at each station of a calculation, either 
through step displacements, or from current mesh veloci- 
ties 0. Alternatively, when it is preferable to prescribe the 
relative motion between the mesh and the material parti- 
cles, the referential velocity w is specified. In this case, 3 
is deduced from equation (13). Usually, in fluid flows, the 
mesh velocity is interpolated, and in solid problems, the 
mesh displacement is directly interpolated. 

First of all, these mesh-updating procedures are classified 
depending on whether the boundary motion is prescribed a 
priori or its motion is unknown. 

When the motion of the material surfaces (usually the 
boundaries) is known a priori, the mesh motion is also 
prescribed a priori. This is done by defining an adequate 
mesh velocity in the domain, usually by simple interpo- 
lation. In general, this implies a Lagrangian description at 
the moving boundaries (the mesh motion coincides with the 
prescribed boundary motion), while a Eulerian formulation 
(fixed mesh velocity ô = 0) is employed far away from the 
moving boundaries. A transition zone is defined in between. 
The interaction problem between a rigid body and a viscous 
fluid studied by Huerta and Liu (1988a) falls in this cate- 
gory. Similarly, the crack propagation problems discussed 
by Koh and Haber (1986) and Koh et al. (1988), where the 
crack path is known a priori, also allow the use of this kind 
of mesh-update procedure. Other examples of prescribed 
mesh motion in nonlinear solid mechanics can be found in 
the works by Liu et al. (1986), Huétink et al. (1990), van 


Haaren et al. (2000), and Rodriguez-Ferran et al. (2002), 
among others. 

In all other cases, at least a part of the boundary is a mate- 
rial surface whose position must be tracked at each time 
step. Thus, a Lagrangian description is prescribed along this 
surface (or at least along its normal). In the first applications 
to fluid dynamics (usually free surface flows), ALE degrees 
of freedom were simply divided into purely Lagrangian 
(8 = v) or purely Eulerian (ô = 0). Of course, the distor- 
tion was thus concentrated in a layer of elements. This is, 
for instance, the case for numerical simulations reported 
by Noh (1964), Franck and Lazarus (1964), Hirt et al. 
(1974), and Pracht (1975). Nodes located on moving bound- 
aries were Lagrangian, while internal nodes were Eulerian. 
This approach was used later for fluid—structure interaction 
problems by Liu and Chang (1984) and in solid mechan- 
ics by Haber (1984) and Haber and Hariandja (1985). This 
procedure was generalized by Hughes et al. (1981) using 
the so-called Lagrange—Euler matrix method. The referen- 
tial velocity, w, is defined relative to the particle velocity, 
v, and the mesh velocity is determined from equation (13). 
Huerta and Liu (1988b) improved this method avoiding the 
need to solve any equation for the mesh velocity inside the 
domain and ensuring an accurate tracking of the material 
surfaces by solving w+ = 0, where n is the unit outward 
normal, only along the material surfaces. Once the bound- 
aries are known, mesh displacements or velocities inside 
the computational domain can in fact be prescribed through 
potential-type equations or interpolations as is discussed 
next, 

In fluid—structure interaction problems, solid nodes are 
usually treated as Lagrangian, while fluid nodes are treated 
as described above (fixed or updated according to some 
simple interpolation scheme). Interface nodes between the 
solid and the fluid must generally be treated as described 
in Section 6.1.2. Occasionally they can be treated as 
Lagrangian (see, for instance, Belytschko and Kennedy, 
1978; Belytschko et al., 1980, 1982; Belytschko and Lin, 
1985; Argyris et al., 1985; Huerta and Liu, 1988b). 

Once the boundary motion is known, several interpola- 
tion techniques are available to determine the mesh rezon- 
ing in the interior of the domain. 


5.1.1 Transfinite mapping method 


This method was originally designed for creating a mesh 
on a geometric region with specified boundaries; see e.g. 
Gordon and Hall (1973), Haber and Abel (1982), and 
Eriksson (1985). The general transfinite method describes 
an approximate surface or volume at a nondenumerable 
number of points. It is this property that gives rise to 
the term transfinite mapping. In the 2-D case, the trans- 
finite mapping can be made to exactly model all domain 
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boundaries, and, thus, no geometric error is introduced 
by the mapping. It induces a very low-cost procedure, 
since new nodal coordinates can be obtained explicitly 
once the boundaries of the computational domain have 
been discretized. The main disadvantage of this method- 
ology is that it imposes restrictions on the mesh topol- 
ogy, as two opposite curves have to be discretized with 
the same number of elements. It has been widely used 
by the ALE community to update nodal coordinates; see 
e.g. Ponthot and Hogge (1991), Yamada and Kikuchi 
(1993), Gadala and Wang (1998, 1999), and Gadala et al. 
(2002). 


5.1.2 Laplacian smoothing and variational methods 


As in mesh generation or smoothing techniques, the rezon- 
ing of the mesh nodes consists in solving a Laplace (or 
Poisson) equation for each component of the node veloc- 
ity or position, so that on a logically regular region the 
mesh forms lines of equal potential. This method is also 
sometimes called elliptic mesh generation and was orig- 
inally proposed by Winslow (1963). This technique has 
an important drawback: in a nonconvex domain, nodes 
may run outside it. Techniques to preclude this pitfall 
either increase the computational cost enormously or intro- 
duce new terms in the formulation, which are particular 
to each geometry, Examples based on this type of mesh- 
update algorithms are presented, among others, by Ben- 
son (1989, 1992a,b), Liu et al. (1988, 1991), Ghosh and 
Kikuchi (1991), Chenot and Bellet (1995), and Löhner and 
Yang (1996). An equivalent approach based on a mechan- 
ical interpretation: (non)linear elasticity problem is used 
by Schreurs et al. (1986), Le Tallec and Martin (1996), 
Belytschko et al. (2000), and Armero and Love (2003), 
while Cescutti et al. (1988) minimize a functional quan- 
tifying the mesh distortion. 


5.1.3 Mesh-smoothing and simple interpolations 


In fact, in ALE, it is possible to use any mesh-smoothing 
algorithm designed to improve the shape of the elements 
once the topology is fixed. Simple iterative averaging proce- 
dures can be implemented where possible; see, for instance, 
Donea et al. (1982), Batina (1991), Trépanier et al. (1993), 
Ghosh and Raju (1996), and Aymone et al. (2001). A more 
robust algorithm (especially in the neighborhood of bound- 
aries with large curvature) was proposed by Giuliani (1982) 
on the basis of geometric considerations. The goal of this 
method is to minimize both the squeeze and distortion of 
each element in the mesh. Donea (1983) and Huerta and 
Casadei (1994) show examples using this algorithm; Sar- 
rate and Huerta (2001) and Hermansson and Hansbo (2003) 


422 Arbitrary Lagrangian~Eulerian Methods 


(b) 


Figure 5. Use of the ALE formulation as an r-adaptive techniqu 
fixed mesh. Either (b) a fine fixed mesh or (c) a coarse ALE 
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made improvements to the original procedure. The main 
advantage of these mesh-regularization methods is that they 
are both simple and rather general. They can in fact be 
applied to unstructured meshes consisting of triangular and 
quadrilateral elements in 2-D, and to tetrahedral, hexahe- 
dral, prism, and pyramidal elements in 3-D. 


5.2 Mesh adaptation 


When the ALE description is used as an adaptive tech- 
nique, the objective is to optimize the computational mesh 
to achieve an improved accuracy, possibly at low comput- 
ing cost (the total number of elements in a mesh remains 
unchanged throughout the computation, as well as the ele- 
ment connectivity). Mesh refinement is typically carricd 
out by moving the nodes towards zones of strong solution 
gradient, such as localization zones in large deformation 
problems involving softening materials. The ALE algorithm 
then includes an indicator of the error, and the mesh is 
modified to obtain an equi-distribution of the error over the 
entire computational domain. The remesh indicator can, for 
instance, be made a function of the average or the jump 
of a certain state variable. Equi-distribution can be carried 
out using an elliptic or a parabolic differential equation. 
The ALE technique can nevertheless be coupled with tradi- 
tional mesh-refinement procedures, such as /-adaptivity, to 
further enhance accuracy through the selective addition of 
new degrees of freedom (see Askes and Rodriguez-Ferran, 
2001). 

Consider, for instance, the use of the ALE formulation 
for the prediction of yield-line patterns in plates (see Askes 
et al., 1999). With a coarse fixed mesh, (Figure 5(a)), the 
spatial discretization is too poor and the yield-line pattern 
cannot be properly captured, One possible solution is, of 
course, to use a finer mesh (see Figure 5(b)). Another very 
attractive possibility from a computational viewpoint is to 
stay with the coarse mesh and use the ALE formulation 
to relocate the nodes (see Figure 5(c)). The level of plasti- 
fication is used as the remesh indicator. Note that, in this 


e. The yield-line pattern is not properly captured with (a) a coarse 
mesh is required, A color version of this image is available at 


problem, element distortion is not a concern (contrary to the 
typical situation illustrated in Figure 2); nodes are relocated 
to concentrate them along the yield lines. 

Studies on the use of ALE as a mesh-adaptation tech- 
mique in solid mechanics are reported, among others, by 
Pijaudier-Cabot et al. (1995), Huerta et al. (1999), Askes 
and Sluys (2000), Askes and Rodriguez-Ferran (2001), 
Askes et al. (2001), and Rodrfguez-Ferran et al. (2002). 

Mesh adaptation has also found widespread use in fluid 
dynamics, Often, account must be taken of the directional 
character of the flow, so anisotropic adaptation procedures 
are to be preferred. For example, an efficient adaptation 
method for viscous flows with strong shear layers has 
to be able to refine directionally to adapt the mesh to 
the anisotropy of the flow. Anisotropic adaptation criteria 
again have an error estimate as a basic criterion (see, for 
instance, Fortin et al. (1996), Castro-Diaz et al. (1996), Ait- 
Ali-Yahia et al. (2002), Habashi et al. (2000), and Miiller 
(2002) for the practical implementation of such procedures). 


6 ALE METHODS IN FLUID DYNAMICS 


Owing to its superior ability with respect to the Eulerian 
description to deal with interfaces between materials and 
mobile boundaries, the ALE description is being widely 
used for the spatial discretization of problems in finid 
and structural dynamics. In particular, the method is fre- 
quently employed in the so-called hydrocodes, which are 
used to simulate the large distortion/deformation response 
of materials, structures, fluids, and fluid—structure systems. 
They typically apply to problems in impact and penetration 
mechanics, fracture mechanics, and detonation/blast analy- 
sis. We shall briefly illustrate the specificities of ALE tech- 
niques in the modeling of viscous incompressible flows and 
in the simulation of inviscid, compressible flows, including 
interaction with deforming structures. 


The most obvious influence of an ALE formulation in: 


flow problems is that the convective term must account for 
the mesh motion. Thus, as already discussed in Section 4.1, 


the convective velocity ¢ replaces the material velocity v, 
which appears in the convective term of Eulerian formula- 
tions (see equations 31 and 34). Note that the mesh motion 
may increase or decrease the convection effects. Obvi- 
ously, in pure convection (for instance, if a fractional-step 
algorithm is employed) or when convection is dominant, 
stabilization techniques must be implemented. The inter- 
ested reader is urged to consult Chapter 2, Volume 3, for a 
thorough exposition of stabilization techniques available to 
remedy the lack of stability of the standard Galerkin formu- 
lation in convection-dominated situations, or the textbook 
by Donea and Huerta (2003). 

It is important to note that in standard fluid dynamics, the 
stress tensor only depends on the pressure and (for viscous 
flows) on the velocity field at the point and instant under 
consideration. This is not the case in solid mechanics, as 
discussed below in Section 7. Thus, stress update is not a 
major concern in ALE fluid dynamics. 


6.1 Boundary conditions 


The rest of the discussion of the specificities of the ALE for- 
mulation in fluid dynamics concerns boundary conditions. 
In fact, boundary conditions are related to the problem, not 
to the description employed. Thus, the same boundary con- 
ditions employed in Eulerian or Lagrangian descriptions 
are implemented in the ALE formulation, that is, along the 
boundary of the domain, kinematical and dynamical condi- 
tions must be defined. Usually, this is formalized as 


v=vp only 
neo=t only 


where vp and f are the prescribed boundary velocities and 
tractions respectively; z is the outward unit normal to Ty, 
and Tp and Ty are the two distinct subsets (Dirichlet and 
Neumann respectively), which define the piecewise smooth 
boundary of the computational domain. As usual, stress 
conditions on the boundaries represent the ‘natural bound- 
ary conditions’, and thus, they are automatically included 
in the weak form of the momentum conservation equation 
(see 34), 

If part of the boundary is composed of a material surface 
whose position is unknown, then a mixture of both condi- 
tions is required. The ALE formulation allows an accurate 
treatment of material surfaces. The conditions required on 
a material surface are: (a) no particles can cross it, and 
(b) stresses must be continuous across the surface (if a net 
force is applied to a surface of zero mass, the acceleration 
is infinite). Two types of material surfaces are discussed 
here: free surfaces and fluid—structure interfaces, which 


Arbitrary Lagrangian~Eulerian Methods 423 


may or may not be frictionless (whether or not the fluid 
is inviscid). 


6.1.1 Free surfaces 


The unknown position of free surfaces can be computed 
using two different approaches. First, for the simple case 
of a single-valued function z = z(x, y,t), a hyperbolic 
equation must be solved, 


az 
a eee = 


This is the kinematic equation of the surface and has been 
used, for instance, by Ramaswamy and Kawahara (1987), 
Huerta and Liu (1988b, 1990), and Souli and Zolesio 
(2001). Second, a more general approach can be obtained 
by simply imposing the obvious condition that no particle 
can cross the free surface (because it is a material surface). 
This can be imposed in a straightforward manner by using 
a Lagrangian description (i.e. w = 0 or v = 8) along this 
surface. However, this condition may be relaxed by impos- 
ing only the necessary condition: w equal to zero along 
the normal to the boundary (i.e. n -w = 0, where n is the 
outward unit normal to the fluid domain, or n -v = 7-3). 
The mesh position, normal to the free surface, is deter- 
mined from the normal component of the particle velocity 
and remeshing can be performed along the tangent, see, 
for instance Huerta and Liu (1989) or Braess and Wriggers 
(2000). In any case, these two alternatives correspond to 
the kinematical condition; the dynamic condition expresses 
the stress-free situation, 1-0 = 0, and since it is a homo- 
geneous natural boundary condition, as mentioned earlier, 
it is directly taken into account by the weak formulation. 


6.1.2 Fluid—structure interaction 


Along solid-wall boundaries, the particle velocity is cou- 
pled to the rigid or flexible structure. The enforcement of 
the kinematic requirement that no particles can cross the 
interface is similar to the free-surface case. Thus, conditions 
n-w = Oorn-v =n-@ are also used. However, due to the 
coupling between fluid and structure, extra conditions are 
needed to ensure that the fluid and structural domains will 
not detach or overlap during the motion. These coupling 
conditions depend on the fluid. 

For an inviscid fluid (no shear effects), only normal 
components are coupled because an inviscid fluid is free 
to slip along the structural interface; that is, 


n-u=n'us continuity of normal displacements 
n-v=n-V, continuity of normal velocities 
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where the displacement/velocity of the fluid (w/v) along 
the normal to the interface must be equal to the dis- 
placement/velocity of the structure (ttg/vg) along the same 
direction. Both equations are equivalent and one or the other 
is used, depending on the formulation employed (displace- 
ments or velocities), 

For a viscous fluid, the coupling between fluid and 
structure requires that velocities (or displacements) coincide 
along the interface; that is, 


u=Ug continuity of displacements 
v=Vs continuity of velocities 


In practice, two nodes are placed at each point of the 
interface: one fluid node and one structural node. Since the 
fluid is treated in the ALE formulation, the movement of the 
fluid mesh may be chosen completely independent of the 
movement of the fluid itself. In particular, we may constrain 
the fluid nodes to remain contiguous to the structural nodes, 
so that all nodes on the sliding interface remain permanently 
aligned. This is achieved by prescribing the grid velocity 
Ô of the fluid nodes at the interface to be equal to the 
material velocity vg of the adjacent structural nodes. The 
permanent alignment of nodes at ALE interfaces greatly 
facilitates the flow of information between the fluid and 
structural domains and permits fluid—structure coupling to 
be effected in the simplest and the most elegant manner; 
that is, the imposition of the previous kinematic conditions 
is simple because of the node alignment. 

The dynamic condition is automatically verified along 
fixed rigid boundaries, but it presents the classical difficul- 
ties in fluid—structure interaction problems when compat- 
ibility at nodal level in velocities and stresses is required 
(both for flexible or rigid structures whose motion is cou- 
pled to the fluid flow). This condition requires that the 
stresses in the fluid be equal to the stresses in the structure. 
When the behavior of the fluid is governed by the linear 
Stokes law (o = —pI+ 2vVSv) or for inviscid fluids this 
condition is 

-pn + 2x(n. Vw =n -os or —pn=Nn-Og 
respectively, where og is the stress tensor acting on the 
structure, In the finite element representation, the continu- 
ous interface is replaced with a discrete approximation and 
instead of a distributed interaction pressure, consideration 
is given to its resultant at each interface node. 

There is a large amount of literature on ALE fluid—struc- 
ture interaction, both for flexible structures and for rigid 
solids; see, among others, Liu and Chang (1985), Liu and 
Gvildys (1986), Nomura and Hughes (1992), Le Tallec and 
Mouro (2001), Casadei et al. (2001), Sarrate et al. (2001), 
and Zhang and Hisada (2001). 


Remark (Fluid—rigid-body interaction) In some circum- 
stances, especially when the structure is embedded in a 
fluid and its deformations are small compared with the 
displacements and rotations of its center of gravity, it is 
justified to idealize the structure as a rigid body resting 
on a system consisting of springs and dashpots. Typi- 
cal situations in which such an idcalization is legitimate 
include the simulation of wind-induced vibrations in high- 
rise buildings or large bridge girders, the cyclic response of 
offshore structures exposed to sea currents, as well as the 
behavior of structures in aeronautical and naval engineer- 
ing where structural loading and response are dominated by 
fluid-induced vibrations. An illustrative example of ALE 
fluid~—rigid-body interaction is shown is Section 6.2. 


Remark (Normal to a discrete interface) In practice, espe- 
cially in complex 3-D configurations, one major difficulty 
is to determine the normal vector at each node of the 
fluid—structure interface. Various algorithms have been 
developed to deal with this issue, Casadei and Halleux 
(1995) and Casadei and Sala (1999) present detailed solu- 
tions. In 2-D, the tangent to the interface at a given node is 
usually defined as parallel to the line connecting the nodes 
at the ends of the interface segments meeting at that node. 


Remark (Free surface and structure interaction) The 
above discussion of the coupling problem only applies to 
those portions of the structure that are always submerged 
during the calculation. As a matter of fact, there may exist 
portions of the structure, which only come into contact with 
the fluid some time after the calculation begins. This is, 
for instance, the case for structural parts above a fluid-free 
surface. For such portions of the structural domain, some 
sort of sliding treatment is necessary, as for Lagrangian 
methods. 


6.1.3 Geometric conservation laws 


In a series of papers (see Lesoinne and Farhat, 1996; 
Koobus and Farhat, 1999; Guillard and Farhat, 2000; and 
Farhat et al., 2001), Farhat and coworkers have discussed 
the notion of geometric conservation laws for unsteady flow 
computations on moving and deforming finite element or 
finite volume grids. 

The basic requirement is that any ALE computational 
method should be able to predict exactly the trivial solution 
of a uniform flow. The ALE equation of mass balance (37); 
is usually taken as the starting point to derive the geometric 
conservation law. Assuming uniform fields of density p and 
material velocity v, it reduces to the continuous geometric 
conservation law 


a 
2 fp dea fl 
al, fe nds (38) 


As remarked by Smith (1999), equation (38) can also be 
derived from the other two ALE integral conservation laws 
(37) with appropriate restrictions on the flow fields. 

Integrating equation (38) in time from £” to t”*! renders 
the discrete geometric conservation law (DGCL) 


Puna 
jar) — yon) = f (J i -nds) dt (39) 
Pad Sı 


which states that the change in volume (or area, in 2-D) 
of each element from 1” to ¢**! must be equal to the 
volume (or area) swept by the element boundary during 
the time interval. Assuming that the volumes Q, in the 
left-hand side of equation (39) can be computed exactly, 
this amounts to requiring the exact computation of the flux 


> in the right-hand side also. This poses some restrictions on 


the update procedure for the grid position and velocity. For 
instance, Lesoinne and Farhat (1996) show that, for first- 
order time-integration schemes, the mesh velocity should 
be computed as art? = (yn+] — x")/At. They also point 
out that, although this intuitive formula was used by many 
time-integrators prior to DGCLs, it is violated in some 
instances, especially in fluid—structure interaction problems 
where mesh motion is coupled with structural deformation. 

The practical significance of DGCLs is a debated issue in 
the literature. As admitted by Guillard and Farhat (2000), 


> ‘there are recurrent assertions in the literature stating that, 


in practice, enforcing the DGCL when computing on mov- 


ing meshes is unnecessary’. Later, Farhat et al. (2001) 


and other authors have studied the properties of DGCL- 
enforcing ALE schemes from a theoretical viewpoint. The 
link between DGCLs and the stability (and accuracy) 
of ALE schemes is still a controversial topic of current 
research. 


6.2 Applications in ALE fluid dynamics 


The first example consists in the computation of cross-flow 
and rotational oscillations of a rectangular profile. The flow 
is modeled by the incompressible Navier-Stokes equations 
and the rectangle is regarded as rigid. The ALE formulation 
for fluid—rigid-body interaction proposed by Sarrate et al. 
(2001) is used. 

Figure 6 depicts the pressure field at two different 
instants. The flow goes from left to right. Note the cross- 
fiow translation and the rotation of the rectangle. The ALE 
kinematical description avoids excessive mesh distortion 
(see Figure 7). For this problem, a computationally efficient 
rezoning strategy is obtained by dividing the mesh into three 
zones: (1) the mesh inside the inner circle is prescribed to 
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(b) yee 


Figure 6. Flow around a rectangle. Pressure fields at two dif- 
ferent instants. A color version of this image is available at 
http://www .mrw.interscience.wiley.com/ecm 
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Figure 7. Details of finite element mesh around the rectangle. 
The ring allows a smooth transition between the rigidly moving 
mesh around the rectangle and the Eulerian mesh far from it. 
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move rigidly attached to the rectangle (no mesh distortion 
and simple treatment of interface conditions); (2) the mesh 
outside the outer circle is Eulerian (no mesh distortion and 
no need to select mesh velocity); (3) a smooth transition is 
prescribed in the ring between the circles (mesh distortion 
under control). 

The second example highlights ALE capabilities for 
fluid—structure interaction problems. The results shown 
here, discussed in detail by Casadei and Potapov (2004), 
have been provided by Casadei and are reproduced here 
with the authors’ kind permission. The example con- 
sists in a long 3-D metallic pipe with a square cross 
section, sealed at both ends, contaiming a gas at room 
pressure (see Figure 8). Initially, two ‘explosions’ take 
place at the ends of the pipe, simulated by the pres- 
ence of the same gas, but at a much higher initial 
pressure. 

The gas motion through the pipe is partly affected 
by internal structures within the pipe (diaphragms #1, 
#2 and #3) that create a sort of labyrinth. All the pipe 
walls, and the internal structures, are deformable and are 
characterized. by elastoplastic behavior. The pressures and 
structural-material properties are so chosen that very large 
motions and relatively large deformations occur in the 
structure. 

Figure 9 shows the real deformed shapes (not scaled up) 
of the pipe with superposed fluid-pressure maps. Note the 
strong wave-propagation effects, the partial wave reflec- 
tions at obstacles, and the ‘ballooning’ effect of the thin 
pipe walls in regions at high pressure. This is a severe test, 
among other things, for the automatic ALE rezoning algo- 
rithms that must keep the fluid mesh reasonably uniform 
under large motions. 


Diaphragm #3 


g 


iaphragm # 2 


Diaphragm # 1 


High-pressure 


Figure 8. Explosions in a 3-D labyrinth. Problem statement. 


A color version of this image is available at http://www.mrw. 
interscience.wiley.com/ecm 


7 ALE METHODS IN NONLINEAR 
SOLID MECHANICS 


Starting in the late 1970s, the ALE formulation has 
been extended to nonlinear solid and structural mechanics. 
Particular efforts were made in response to the need to 
simulate problems describing crack propagation, impact, 
explosion, vehicle crashes, as well as forming processes 
of materials. The large distortions/deformations that char- 
acterize these problems clearly undermine the utility of 
the Lagrangian approach traditionally used in problems 
involving materials with path-dependent constitutive rela- 
tions. Representative publications on the use of ALE in 
solid mechanics are, among many others, Liu et al. (1986, 
1988), Schreurs et al. (1986), Benson (1989), Huétink et al. 
(1990), Ghosh and Kikuchi (1991), Baaijens (1993), Huerta 
and Casadei (1994), Rodriguez-Ferran et al. (1998, 2002), 
Askes et al. (1999), and Askes and Sluys (2000). 

If mechanical effects are uncoupled from thermal 
effects, the mass and momentum equations can be solved 
independently from the energy equation. According to 
expressions (34), the ALE version of these equations 
is 


dp 

Fly tev = -oy v (40a) 
3 

a= Pal, tE =V-o+pb (40b) 


where a is the material acceleration defined in 
(35a, b and c), o denotes the Cauchy stress tensor 
and b represents an applied body force per unit 
mass. 

A standard simplification in nonlinear solid mechanics 
consists of dropping the mass equation (40a), which is not 
explicitly accounted for, thus solving only the momentum 
equation (40b). A common assumption consists of taking 
the density p as constant, so that the mass balance (40a) 
reduces to 


V-v=0 (41) 


which is the well-known incompressibility condition. This 
simplified version of the mass balance is also commonly 
neglected in solid mechanics. This is acceptable because 
elastic deformations typically induce very small changes 
in volume, while plastic deformations are volume pre- 
serving (isochoric plasticity). This means that changes 
in density are negligible and that equation (41) auto- 
matically holds to sufficient approximation without the 
need to add it explicitly to the set of governing equa- 
tions. 
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Figure 9. Explosions in a 3-D labyrinth. Deformation in structure and pressure in fluid are properly captured with ALE fluid—structure 
interaction: (a) whole model; (b) zoom of diaphragm #1: (c) zoom of diaphragms #2 and #3. A color version of this image is available 
at http://www.mrw.interscience.wiley.com/ecm 
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7.1 ALE treatment of steady, quasistatic and 
dynamic processes 


In discussing the ALE form (40b) of the momentum equa- 
tion, we shall distinguish between steady, quasistatic, and 
dynamic processes. In fact, the expression for the inertia 
forces pa critically depends on the particular type of pro- 
cess under consideration. 

A process is called steady if the material velocity v in 
every spatial point x is constant in time. In the Eulerian 
description (35b), this results in zero local acceleration 
dv/dt|, and only the convective acceleration is present in 
the momentum balance, which reads 


pa = p(v- Vy = V-0+ pb (42) 


In the ALE context, it is also possible to assume that a 
process is steady with respect to a grid point x and neglect 
the local acceleration du/ atly in the expression (35c); see, 
for instance, Ghosh and Kikuchi (1991). The momentum 
balance then becomes 


pa = p(c-V)v = V - o + pb (43) 


However, the physical meaning of a null ALE local 
acceleration (that is, of an “ALE-steady” process) is not 
completely clear, due to the arbitrary nature of the mesh 
velocity and, hence, of the convective velocity c. 

A process is termed quasistatic if the inertia forces pa are 
negligible with respect to the other forces in the momentum 
balance. In this case, the momentum balance reduces to the 
static equilibrium equation 


V-o+pb=0 ` (44) 


in which time and material velocity play no role. Since the 
inertia forces have been neglected, the different descriptions 
of acceleration in equations (35a, b and c) do not appear 
in equation (44), which is therefore valid in both Eulerian 
and ALE formulations. The important conclusion is that 
there are no convective terms in the ALE momentum balance 
for quasistatic processes. A process may be modeled as 
quasistatic if stress variations and/or body forces are much 
larger than inertia forces. This is a common situation 
in solid mechanics, encompassing, for instance, various 
metal-forming processes. As discussed in the next section, 
convective terms are nevertheless present in the ALE (and 
Eulerian) constitutive equation for quasistatic processes. 
They reflect the fact that grid points are occupied by 
different particles at different times. 

Finally, in transient dynamic processes, all terms must be 
retained in expression (35c) for the material acceleration, 


and the momentum balance equation is given by expression 
(40b). 


7.2 ALE constitutive equations 


Compared to the use of the ALE description in fiuid dynam- 
ics, the main additional difficulty in nonlinear solid mechan- 
ics is the design of an appropriate stress-update procedure 
to deal with history-dependent constitutive equations. As 
already mentioned, constitutive cquations of ALE nonlin- 
ear solid mechanics contain convective terms that account 
for the relative motion between mesh and material. This is 
the case for both hypoelastoplastic and hyperelastoplastic 
models. 


7.2.1 Constitutive equations for ALE 
hypoelastoplasticity 


Hypoelastoplastic models are based on an additive decom- 
position of the stretching tensor Vv (symmetric part of 
the velocity gradient) into elastic and plastic parts; see, 
for instance, Belytschko et al. (2000) or Bonet and Wood 
(1997). They were used in the first ALE formulations for 
solid mechanics and are still the standard choice. In these 
models, material behavior is described by a rate-form con- 
stitutive equation 


o* = f(o, Vv) (45) 


relating an objective rate of Cauchy stress o* to stress and 
stretching. The material rate of stress 


ao do 

i k= terme (46) 
cannot be employed in relation (45) to measure the stress 
rate because it is not an objective tensor, so large rigid- 
body rotations are not properly treated. An objective rate 
of stress is obtained by adding to 6 some terms that ensure 
the objectivity of o*; see, for instance, Malvern (1969) or 
Belytschko et al. (2000). Two popular objective rates are 
the Truesdell rate and the Jaumann rate 


o* = 6 —VWv-o — 0. (VWv)! (47) 
where Vy = ¿(Vv — V") is the spin tensor. 
Substitution of equation (47) (or similar expressions for 
other objective stress rates) into equation (45) yields 


6 =q(o, Vv, ...) (48) 


where q contains both f and the terms in o*, which ensure 
its objectivity. 


In the ALE context, referential time derivatives, not 
material time derivatives, are employed to represent evolu- 
tion in time. Combining expression (46) of the material rate 
of stress and the constitutive relation (48) yields a rate-form 
constitutive equation for ALE nonlinear solid mechanics 


ie? +(e-V)o =q (49) 
ðt IX 


where, again, a convective term reflects the motion of 
material particles relative to the mesh. Note that this relative 
motion is inherent in ALE kinematics, so the convective 
term is present in all the situations described in Section 7.1, 
including quasistatic processes. 

Because of this convective effect, the stress update cannot 
be performed as simply as in the Lagrangian formulation, 
in which the element Gauss points correspond to the same 
material particles during the whole calculation. In fact, the 
accurate treatment of the convective terms in ALE rate-type 
constitutive equations is a key issue in the accuracy of the 
formulation, as discussed in Section 7.3. 


7.2.2 Constitutive equations for ALE 
hyperelastoplasticity 


Hyperelastoplastic models are based on a multiplicative 
decomposition of the deformation gradient into elastic and 
plastic parts, F = F°F; see, for instance, Belytschko et al. 
(2000) or Bonet and Wood (1997). They have only very 
recently been combined with the ALE description (see 
Rodriguez-Ferran et al., 2002 and Armero and Love, 2003). 

The evolution of stresses is not described by means of a 
rate-form equation, but in closed form as 


d 
ae y (50) 


where b° = F°. (F°)! is the elastic left Cauchy—Green 
tensor, W is the free energy function, and t = det(F)o is 
the Kirchhoff stress tensor. 

Plastic flow is described by means of the flow rule 


Bo — Vv. b? — b° | (Vv)! = —2m(t) -b° (51) 


The left-hand side of equation (51) is the Lie derivative of 
b° with respect to the material velocity v. In the right-hand 
side, m is the flow direction and Y is the plastic multiplier. 

Using the fundamental ALE relation (27) between mate- 
tial and referential time derivatives, the flow rule (51) can 
be recast as 


e 
Sa + (c+ V)b? = Vv- b? + b°. (Vv)™ — 2ym(x) -b° 
ot İK 62) 
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Note that, like in equation (49), a convective term in this 
constitutive equation reflects the relative motion between 
mesh and material. 


7.3 Stress-update procedures 


In the context of hypoelastoplasticity, various strategies 
have been proposed for coping with the convective terms 
in equation (49). Following Benson (1992b), they can be 
classified into split and unsplit methods. 

If an unsplit method is employed, the complete rate 
equation (49) is integrated forward in time, including 
both the convective term and the material term g. This 
approach is followed, among others, by Liu et al. (1986), 
who employed an explicit time-stepping algorithm and by 
Ghosh and Kikuchi (1991), who used an implicit unsplit 
formulation. 

On the other hand, split, or fractional-step methods treat 
the material and convective terms in (49) in two distinct 
phases: a material (or Lagrangian) phase is followed by 
a convection (or transport) phase. In exchange for a cer- 
tain loss in accuracy due to splitting, split methods are 
simpler and especially suitable in upgrading a Lagrangian 
code to the ALE description. An implicit split formu- 
lation is employed by Huétink ef al. (1990) to model 
metal-forming processes. An example of explicit split for- 
mulation may be found in Huerta and Casadei (1994), 
where ALE finite elements are used to model fast-transient 
phenomena. 

The situation is completely analogous for hyperelasto- 
plasticity, and similar comments apply to the split or unsplit 
treatment of material and convective effects. In fact, if a 
split approach is chosen (see Rodriguez-Ferran et al., 2002), 
the only differences with respect to the hypoelastoplastic 
models are (1) the constitutive model for the Lagrangian 
phase (hypo/hyper) and (2) the quantities to be transported 
in the convection phase. 


7.3.1 Lagrangian phase 


In the Lagrangian phase, convective effects are neglected. 
The constitutive equations recover their usual expressions 
(48) and (51) for hypo- and hyper-models respectively. 
The ALE kinematical description has (momentarily) dis- 
appeared from the formulation, so all the concepts, ideas, 
and algorithms of large strain solid mechanics with a 
Lagrangian description apply (see Bonet and Wood (1997), 
Belytschko et al. (2000), and Chapter 7, Volume 2). 

The issue of objectivity is one of the main differences 
between hypo- and hypermodels. When devising time- 
integration algorithms to update stresses from o” to ott 
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in hypoelastoplastic models, a typical requirement is incre- 
mental objectivity (that is, the appropriate treatment of 
rigid-body rotations over the time interval [7", ttl}, In 
hyperelastoplastic models, on the contrary, objectivity is 
not an issue at all, because there is no rate equation for the 
stress tensor. 


7.3.2 Convection phase 


The convective effects neglected before have to be accoun- 
ted for now. Since material effects have already been treated 
in the Lagrangian phase, the ALE constitutive equations 
read simply 


do 3b? s 

sa i =e == $ = 0 
Temas a Pee = 05 

daj sgewie a0 (53) 
ot lx 


Equations (53), and (53) correspond to hypo- and 
hyperelastoplastic models respectively (cf. with equa- 
tions 49 and 52). In equation (53), valid for both hypo- 
and hyper-models, œ is the set of all the material-dependent 
variables, i.e. variables associated with the material particle 
X: internal variables for hardening or softening plasticity, 
the volume change in nonisochoric plasticity, and so on 
(see Rodriguez-Ferran et al., 2002). 

The three equations in (53) can be written more com- 
pactly as 


om 


se he we (54) 


ot 
where Il represents the appropriate variable in each case. 
Note that equation (54) is simply a first-order linear hyper- 
bolic PDE, which governs the transport of field I by the 
velocity field e. However, two important aspects should be 
considered in the design of numerical algorithms for the 
solution of this equation: 


1. W is a tensor (for o and b°) or vector-like (for œ) field, 
so equation (54) should be solved for each component 
O of B: 


ao 


at IX 


Il 
o 


+e-V (55) 


Since the number of scalar equations (55) may be rel- 
atively large (for instance: eight for a 3-D computation 
with a plastic model with two internal variables), the 
need for efficient convection algorithms is a key issue 
in ALE nonlinear solid mechanics. 

2. O is a Gauss-point-based (i.e. not a nodal-based) 


quantity, so it is discontinuous across finite element 
edges. For this reason, its gradient VO cannot be 


reliably computed. at the element level. In fact, handling 
VO is the main numerical challenge in ALE stress 
update. 


Two different strategies may be used to tackle the 
difficulties associated with VO. One possible approach is 
to approximate O by a continuous field Cl, and replace 
VO by VO in equation (55). The smoothed field O can 
be obtained, for instance, by least-squares approximation 
(see Huétink et al., 1990). i 

Another possibility is to retain the discontinuous field O 
and devise appropriate algorithms that account for this fact, 
To achieve this aim, a fruitful observation is noting that, 
for a piecewise constant field D, equation (55) is the well- 
known Riemann problem. Through this connection, the 
ALE community has exploited the expertise on approximate 
Riemann solvers of the CFD community (see Le Veque, 
1990). 

Although O is, in general, not constant for each element 
(expect for one-point quadratures), it can be approximated 
by a piecewise constant field in a simple manner. Figure 10 
depicts a four-noded quadrilateral with a 2 x 2 quadrature 
subdivided into four subelements. If the value of O for 
each Gauss point is taken as representative for the whole 
subelement, then a field constant within each subelement 
results. 

In this context, equation (55) can be solved explicitly 
by looping all the subelements in the mesh by means of 
a Godunov-like technique based on Godunov’s method for 
conservation laws (see Rodrfguez-Ferran et al., 1998): 


Nr : 
ort = -EDA e -H [1 —sign(f,)] 56) 
r=1 


O Node 
X Gauss point 


Figure 10. Finite element subdivided into subelements for the 
Godunov-like stress update. 


According to equation (56), the Lagrangian (i.e. after the 
Lagrangian phase) value O} is updated into the final value 
D"+L by taking into account the flux of O across the 
subelement edges. In the second term on the right-hand side, 
At is the time step, V is the volume (or area, in 2-D) of the 
subelement, N, is the number of edges per subelement, DE 
is the value of O in the contiguous subelement across edge 
T, and fp is the fiux of convective velocity across edge T, 
fr = fele- n) dI. Note that a full-donor (i.e. full upwind) 
approach is obtained by means of sign( fp). 


Remark (Split stress update and iterations) In principle, 
the complete stress update must be performed at each itera- 
tion of the nonlinear equilibrium problem. However, a com- 
mon simplification consists in leaving the convection phase 
outside the iteration loop (see Baaijens, 1993; Rodriguez- 
Ferran etal., 1998, and Rodriguez-Ferran et al., 2002); 
that is, iterations are performed in a purely Lagrangian 
fashion up to equilibrium and the convection phase is per- 
formed after remeshing, just once per time step. Numerical 
experiments reveal that disruption of equilibrium caused by 
convection is not severe and can be handled as extra resid- 
ual forces in the next load step (see references just cited). 


Remark (ALE finite strain elasticity) Since hyperelasticity 
can be seen as a particular case of hyperelastoplasticity, 
we have chosen here the more general situation for a more 
useful presentation. In the particular case of large elastic 
strains (hyperelasticity), where F = F°, an obvious option 
is to particularize the general approach just described by 
solving only the relevant equation (i.e. equation 52 for b°; 
note that there are no internal variables œ in elasticity). 
In the literature, other approaches exist for this specific 
case. Yamada and Kikuchi (1993) and Armero and Love 
(2003) exploit the relationship F = FgFyy, where Fg and 
Fy are the deformation gradients of mappings ® and Y 
respectively (see Figure 4), to obtain an ALE formulation 
for hyperelasticity with no convective terms. In exchange 
for the need to handle the convective term in the update 
of b° (see equations 52 and 53,), the former approach has 
the advantage that only the quality of mapping ® needs 
to be controlled. In the latter approaches, on the contrary, 
both the quality of & and Y must be ensured; that is, two 
meshes (instead of only one) must be controlled. 


74 Applications in ALE nonlinear solid 
mechanics 


For illustrative purposes, two powder compaction ALE 
simulations are briefly discussed here. More details can be 
found in Rodriguez-Ferran et al. (2002) and Pérez-Foguet 
et al. (2003). 
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The first example involves the bottom punch compaction 
of an axisymmetric flanged component (see Figure 11). If 
a Lagrangian description is used (see Figure 11(a)), the 
large upward mass flow leads to severe element distortion 
in the reentrant corner, which in turn affects the accuracy 
in the relative density. With an ALE description (see 
Figure 11(b)), mesh distortion is completely precluded by 
means of a very simple ALE remeshing strategy: a uniform 
vertical mesh compression is prescribed in the bottom, 
narrow part of the piece, and no mesh motion (i.e. Eulerian 
mesh) in the upper, wide part. 

The second example involves the compaction of a 
multilevel component. Owing to extreme mesh distor- 
tion, it is not possible to perform the simulation with a 
Lagrangian approach. Three ALE simulations are compared 
in Figure 12, corresponding to top, bottom, and simulta- 
neous top—bottom compaction. Even with an unstructuted 
mesh and a more complex geometry, the ALE description 
avoids mesh distortion, so the final relative density profiles 
can be computed (see Figure 13). 


7.5 Contact algorithms 


Contact treatment, especially when frictional effects are 
ptesent, is a very important feature of mechanical mod- 
eling and certainly remains one of the more challeng- 
ing problems in computational mechanics. In the case 
of the classical Lagrangian formulation, much attention 
has been devoted to contact algorithms and the inter- 
ested reader is referred to references by Chapter 6, Vol- 
ume 2, Zhong (1993), Wriggers (2002), and Laursen (2002) 
in order to get acquainted with the subject. By con- 
trast, modeling of contact in connection with the ALE 
description has received much less attention. Paradoxi- 
cally, one of the interesting features of ALE is that, in 
some situations, the formulation can avoid the burden of 
implementing cumbersome contact algorithms. The coin- 
ing problem in Figure 14 is a good illustration of the 
versatility of ALE algorithms in the treatment of friction- 
less contact over a known (moving) surface. The mate- 
rial particle M located at the punch corner at time ¢ 
has been represented. At time t+ At, the punch has 
moved slightly downwards and, due to compression, the 
material particle M has moved slightly to the right. 
When a Lagrangian formulation is used in the simula- 
tion of such a process, due to the fact that the mesh 
sticks to the material, a contact algorithm has to be 
used to obtain a realistic simulation. On the contrary, 
using an ALE formalism, the implementation of a con- 
tact algorithm can be avoided. This is simply because 
the ALE formulation allows us to prevent the horizontal 
displacement of the mesh nodes located under the 
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Lagrangian formulation 
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Figure 11. Final relative density after the bottom punch compacti i 
í ; paction of a flanged component: (a) Lagrangian approach leads to severe 
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Figure 12. Final relative density of a multilevel component. From left to right: top, 
this image is available at http://www.mrw.interscience.wiley.com/ecm 
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Figure 13. Relative density profiles in a multilevel component 
along a vertical line for the three-compaction processes. 
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Figure 14. Schematic description of the coining process. 


punch, irrespective of the material flow. Numerous illustra- 
tions of this particular case can be found, amongst others, 
in Schreurs et al. (1986), Huétink et al. (1990), Hogge and 
Ponthot (1991a), Huerta and Casadei (1994), Rodríguez- 
Ferran et al, (2002), Gadala and Wang (1998), Gadala et al. 
(2002), and Martinet and Chabrand (2000). 

However, in more general situations, a contact algorithm 
Cannot be avoided. In such a case, the nodes of contact 
elements have to be displaced convectively in accordance 
with the nodes on the surface of the bulk materials and 
tools. A direct consequence of this displacement is that 
Convective effects have to be taken into account for history- 
dependent variables. In the simplest penalty case in which 
the normal pressure is proportional to the penetration, the 
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normal contact stress depends only on the current geometry 
and not on the history of penetration. As a consequence, 
no convection algorithm needs to be activated for that 
quantity. On the contrary, for a Coulomb-friction model, 
the shear stresses in the contact elements are incrementally 
calculated. They therefore depend on the history and hence 
a convective increment of the shear stress should be 
evaluated. One simple way to avoid the activation of the 
convection algorithm for the contact/friction quantities is 
indeed to keep the boundaries purely Lagrangian. However, 
in general this will not prevent mesh distortions and 
nonconvexity in the global mesh. 

Various applications of the ALE technology have been 
developed so far to treat the general case of moving 
boundaries with Coulomb or Tresca frictional contact. 
For example, in their pioneering work on ALE contact, 
Haber and Hariandja (1985) update the mesh, so that 
nodes and element edges on the surface of contacting 
bodies coincide exactly at all points along the contact 
interface in the deformed configuration. In such a case, 
the matching of node pairs and element edges ensures a 
precise satisfaction of geometric compatibility and allows 
a consistent transfer of contact stresses between the two 
bodies. A similar procedure was established by Ghosh 
(1992) but, in this case, the numerical model introduces 
ALE nodal points on one of the contacting (slave) surfaces 
that are constrained to follow Lagrangian nodes on the other 
(master) surface. Liu et al. (1991) presented an algorithm, 
mostly dedicated to rolling applications, where the stick 
nodes are assumed Lagrangian, whereas the slip nodes are 
equally spaced between the two adjacent stick nodes. More 
general procedures have been introduced by Huétink et al. 
(1990). 

More recently, sophisticated frictional models incorporat- 
ing lubrication models have becn used in ALE formulations. 
In such a case, the friction between the contacting lubri- 
cated bodies is expressed as a function of interface variables 
(mean lubrication film thickness, sheet and tooling rough- 
ness) in addition to more traditional variables (interface 
pressure, sliding speed, and strain rate). Examples of com- 
plex lubrication models integrated into an ALE framework 
have been presented by Hu and Liu (1992, 1993, 1994), 
Martinet and Chabrand (2000), and Boman and Ponthot 
(2002). 
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1 INTRODUCTION: SCALAR 
NONLINEAR CONSERVATION LAWS 


Many problems arising in science and engineering lead to 
the study of nonlinear hyperbolic conservation laws. Some 
examples include fluid mechanics, meteorology, electro- 
magnetics, semi conductor device simulation, and numerous 
models of biological processes. As a prototype conservation 
law, consider the Cauchy initial value problem 


in R x Rt 


in R’ 


3u +V- ft) =0 
u(x, 0) = u(x) 


(la) 
(1b) 
Here u(x, t):R? x Rt —> R denotes the dependent solu- 


tion variable, f (u) € C!(R) denotes the flux function, and 
u(x): R? > R the initial data. 
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The function u is a classical solution of the scalar 
initial value problem if u e C1(IR¢ x Rt) satisfies (1a, 1b) 
pointwise. An essential feature of nonlinear conservation 
laws is that, in general, gradients of u blow up in finite time, 
even when the initial data uy is arbitrarily smooth. Beyond 
some critical time tg classical solutions of (1a, 1b) do not 
exist. This behavior will be demonstrated shortly using 
the method of characteristics. By introducing the notion 
of weak solutions of (la, 1b) together with an entropy 
condition, it then becomes possible to define a class of 
solutions where existence and uniqueness is guaranteed for 
times greater than tọ. These are precisely the solutions that 
are numerically sought in the finite volume method. 


1.1 The method of characteristics 


Let u be a classical solution of (1a) subject to initial data 
(1b). Further, define the vector 


au) = f) = (M.-Y 
A characteristic I’, is a curve (x(t), £) such that 


x(t) = a(u(x(t), )) for t>0 
x(0) =§ 


Since u is assumed to be a classical solution, it is readily 
verified that 


Suet, t) = ðu + x'(t)Vu 


= ðu +a(u)Vu = 3u +V- flu) =0 
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Therefore, u is constant along a characteristic curve and T; 
is a straight line since 


x'(t) = a (u(x (t), t)) = a (u(x (0), 0)) 
= a(u(§, 0)) = a (uo(&)) = const 


In particular, x(t) is given by 
x(t) = § + ta(ug(&)) 2) 


This important property may be used to construct classical 
solutions. If x and £ are fixed and & determined as a solution 
of (2), then 


u(x, t) = uo(§) 


This procedure is the basis of the so-called method of 
characteristics. On the other hand, this construction shows 
that the intersection of any two straight characteristic lines 
leads to a contradiction in the definition of u(x, t). Thus, 
classical solutions can only exist up to the first time f) at 
which any two characteristics intersect. 


1.2 Weak solutions 


Since, in general, classical solutions only exist for a finite 
time fọ, it is necessary to introduce the notion of weak 
solutions that are well defined for times t > tg- 


Definition 1 (Weak solution) Let uy € L©(R*). Then, u 
is a weak solution of (1a, 1b) ifu e L(R? x R*) and (la, 
Ib} hold in the distributional sense, that is, 


[ f (49,6 + f(u) - Vip) dt dx 
RI JRt 
+ f Ugb(x, 0)dx =0 forall oe CER x Rt) (3) 
Re 


Note that classical solutions are weak solutions and weak 
solutions that lie in C!@R? x Rt) satisfy (la, 1b) in the 
classical sense. 

It can be shown (see Kruzkov, 1970; Oleinik, 1963) that 
there always exists at least one weak solution to (1a, 1b) if 
the flux function f is at least Lipschitz continuous. Nev- 
ertheless, the class of weak solutions is too large to ensure 
uniqueness of solutions. An important class of solutions 
are piecewise classical solutions with discontinuities sep- 
arating the smooth regions. The following lemma gives a 
necessary and sufficient condition imposed on these discon- 
tinuities such that the solution is a weak solution; see, for 
example, Godlewski and Raviart (1991) and Kröner (1997). 


Later a simple example is given where infinitely many weak 
solutions exist. 


Lemma 1 (Rankine~Hugoniot jump condition) Assume 
that R4 x Rt is separated by a smooth hypersurface S into 
two parts Q and Q,. Furthermore, assume u is a c!- 
function on Q; and Q, respectively. Then, u is a weak 
solution of (la, Ib) if and only if the following two con- 
ditions hold: 


i) u is a classical solution in Q, and Q,. 
ii) u satisfies the Rankine—Hugoniot jump condition, that 
is, 


[uls=[f@)]-v on S 4) 


Here, (v, —s)? denotes a unit normal vector for the hyper- 
surface S and [u] denotes the jump in u across the hyper- 
surface S. 


In one space dimension, it may be assumed that S is 
parameterized by (o(f), t) such that s = o’(t) and v= 1. 
The Rankine—Hugoniot jump condition then reduces to 


LF] 
[u] 


Example 1 (Non uniqueness of weak solutions) Con- 
sider the one-dimensional Burgers’ equation, f(u) = u? 12; 
with Riemann data: ug(x) = u; for x < 0 and u(x) =u, 
for x > 0. Then, for any a > max(u,,—u,) a function u 
given by 


Ss = 


ns (5) 


uy, xX < st 
—a, st<x<0 6 
a, 0 <x < Sot (6) 
Up, Sol <x 


u(x,t) = 


is a weak solution if sı = (u; — a)/2 and s, = (a + u,)/2. 
This is easily checked since u is piecewise constant and 
satisfies the Rankine~Hugoniot jump condition. This elu- 
cidates a one-parameter family of weak solutions. In fact, 
there is also a classical solution whenever u; < u,. In this 
case, the characteristics do not intersect and the method of 
characteristics yields the classical solution 


uw, x <ut 
u(x,t) = {x/t, ut<x<u,t (7) 
ur, Ut<x 


This solution is the unique classical solution but not the 
unique weak solution. Consequently, additional conditions 
must be introduced in order to single out one solution within 
the class of weak solutions. These additional conditions 
give rise to the notion of a unique entropy weak solution. 


13 Entropy weak solutions and vanishing 
viscosity 


In order to introduce the notion of entropy weak solutions, 
it is useful to first demonstrate that there is a class of 
additional conservation laws for any classical solution of 
(la). Let u be a classical solution and n:R —> Ra smooth 
function. Multiplying (1a) by 1’(z), one obtains 


O= nau +n @)V- fu) =Inw)+V-F@) (8) 


where F is any primitive of y/ f’. This reveals that for a 


` classical solution u, the quantity n(x), henceforth called an 


entropy function, is a conserved quantity. 


Definition 2 (Entropy-entropy flux pair) Let n:R—> 


-R be a smooth convex function and F:R —> R a smooth 
function such that 


F=wf' (9) 


l in (8). Then (n, F) is called an entropy—entropy flux pair or 


more simply an entropy pair for the equation (1a). 


- Note 1 (Kruzkov entropies) The family of smooth convex 


entropies 1 may be equivalently replaced by the nonsmooth 
family of so-called Kruzkov entropies, that is, n,.(u) = |u — 


k| for all k e R (see Kröner, 1997). 


< Unfortunately, the relation (8) can not be fulfilled for weak 


solutions in general, as it would lead to additional jump 


: conditions that would contradict the Rankine—Hugoniot 
: jump condition lemma. Rather, a weak solution may satisfy 
` the relation (8) in the distributional sense with inequality. 
` To see that this concept of entropy effectively selects 
` a unique, physically relevant solution among all weak 


solutions, consider the viscosity perturbed equation 
aue t V- fU) = Au (10) 


with e > 0. For this parabolic problem, it may be assumed 
that a unique smooth solution u, exists. Multiplying by n 


and rearranging terms yields the additional equation 


anu.) +V- Fu) = Anu) = en’ udul? 


Furthermore, since n is assumed convex (n” > 0), the 
following inequality is obtained 


anu) +V- Fu, < «Anu 


Taking the limit e —> 0 establishes (see Málek, Nečas, 
Rokyta and Rguzitka, 1996) that u, converges towards 
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some u a.e. in R x Rt where u is a weak solution of 
(la, 1b) and satisfies the entropy condition 


anu) +V Flu) <0 (11) 


in the sense of distributions on R? x Rt. 

By this procedure, a unique weak solution has been 
identified as the limit of the approximating sequence u,. The 
obtained solution u is called the vanishing viscosity weak 
solution of (1a, 1b), Motivated by the entropy inequality 
(11) of the vanishing viscosity solution, it is now possible to 
introduce the notion of entropy weak solutions. This notion 
is weak enough for the existence and strong enough for the 
uniqueness of solutions to (la, 1b). 


Definition 3 (Entropy weak solution) Let u be a weak 
solution of (1a, 1b). Then, u is called an entropy weak 
solution if u satisfies for all entropy pairs (n, F) 


ji Í (n(u)a, + FU) - Vo) dt dx 
R? JRH 


4 Í nu dr = 0 
forall ġ e Cj(R* x R*, R*) (12) 


From the vanishing viscosity method, it is known that 
entropy weak solutions exist. The following L? contraction 
principle guarantees that entropy solutions are uniquely 
defined; see Kruzkov (1970). 


Theorem 1 (Z1-contraction principle) Let u and v be 
two entropy weak solutions of (1a, 1b) with respect to initial 
data ug and vo. Then, the following L}-contraction principle 
holds 


lu. th—v, rigs Ss llo es volzi) (33) 
for almost every t > 0. 


This principle demonstrates a continuous dependence of the 
solution on the initial data and consequently the uniqueness 
of entropy weak solutions. Finally, note that an analog of 
the Rankine—Hugoniot condition exists (with inequality) in 
terms of the entropy pair for all entropy weak solutions 


In@)]s >[F@)]-v on S$ (14) 


1.4 Measure-valued or entropy process solutions 


The numerical analysis of conservation laws requires an 
even weaker formulation of solutions to (1a, 1b). For 
instance, the convergence analysis of finite volume schemes 
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makes it necessary to introduce so-called measure-valued or 
entropy process solutions; see DiPerna (1985) and Eymard, 
Galluoét and Herbin (2000). 


Definition 4 (Entropy process solution) A function 
u(x,t, 0) € L°(R? x RH x (0, 1)) is called an entropy 
process solution of (1a, 1b) ifu satisfies for all entropy pairs 
m F) 


i L [wa (> + F(u) - Vo) dadt dx 


+ f nuded = 0 
R? 
forall ọ € Ci(R’ x Rt, R+) 


The most important property of such entropy process solu- 
tions is the following uniqueness and regularity result (see 
Eymard, Galluoét and Herbin, 2000, Theorem 6.3). 


Theorem 2 (Uniqueness of entropy process solutions) 

Letug € L®(RÎ) and f € C 1(R). The entropy process solu- 
tion w of problem (1a, 1b) is unique. Moreover, there exists 
a function u € L® (Rİ x R?) such that u(x, t) = p(x, t, a) 
ae. for (x, t,o) E€ R? x Rt x (0,1) and u is the unique 
entropy weak solution of (1a, 1b). 


2 FINITE VOLUME (FV) METHODS FOR 
NONLINEAR CONSERVATION LAWS 


In the finite volume method, the computational domain, 
Q c Ri, is first tessellated into a collection of nonover- 
lapping control volumes that completely cover the domain. 
Notationally, let 7 denote a tessellation of the domain Q 
with control volumes T € T such that Upe? = Q. Let hy 
denote a length scale associated with each control volume 
T, for example, hy = diam(T). For two distinct control vol- 
umes T, and T; in T; the intersection is either an oriented 
edge (2-D) or face (3-D) e;; with oriented normal v;; or else 
a set of measure at most d — 2. In each control volume, an 
integral conservation law statement is then imposed. 


Definition 5 (integral conservation law) An integral 
conservation law asserts that the rate of change of the total 
amount of a substance with density u in a fixed control vol- 
ume T is equal to the total flux of the substance through the 
boundary 0T 


& fuer f fW:.dv=0 (15) 
dt Jr ar 


This integral conservation law statement is readily obtained 
upon spatial integration of the divergence equation (1a) in 


* Storage location 


Control volume 


(b) 


Figure 1. Control volume variants used in the finite ` volume 
method (a) cell-centered and (b) vertex-centered control volume 
tessellation. 


the region T and application of the divergence theorem. 
The choice of control volume tessellation is flexible in 
the finite volume method. For example, Figure 1 depicts 
a 2-D triangle complex and two typical control volume tes- 
sellations (among many others) used in the finite volume 
method. In the cell-centered finite volume method shown in ' 
Figure 1(a), the triangles themselves serve as control vol- 
umes with solution unknowns (degrees of freedom) stored 
on a per triangle basis. In the vertex-centered finite volume 
method shown in Figure 1(b), control volumes are formed 
as a geometric dual to the triangle complex and solution 
unknowns stored on a per triangulation vertex basis. 


2.4 Godunov finite volume discretizations 


Fundamental to finite volume methods is the introduction : 
of the control volume cell average for each T; € T 


1 

u, = >] u (16 
= Tis } 
For stationary meshes, the finite volume method can be 
interpreted as producing an evolution equation for cell : 
averages 

d d 

h” =| T (17) 
Godunov (1959) pursued this interpretation in the dis- 
cretization of the gas dynamic equations by assuming 
piecewise constant solution representations in each con-~ 
trol volume with value equal to the cell average. However, 
the use of piecewise constant representations renders the 
numerical solution multivalued at control volume inter- 
faces thereby making the calculation of a single solution 
flux at these interfaces ambiguous. The second aspect of 
Godunov’s scheme and subsequent variants was the idea 
of supplanting the true flux at interfaces by a numerical 
flux function, glu, v): R x R |> R, a Lipschitz continuous 
function of the two interface states u and v. A single 


A 
u(x t) + 
1 ee 
T Nyl 
yf: 
By g 
E 
T -e 
X92 X-u Xj Xa x 


Figure 2. 1-D control volume, T; = [xj-1/2, xj+1/2], depicting 
Godunov’s interface Riemann problems, wj.1/2(§, t), from piece- 
wise constant interface states. 


unique numerical flux was then calculated from an exact 
or approximate local solution of the Riemann problem in 
gas dynamics posed at these interfaces. Figure 2 depicts a 
representative 1-D solution profile in Godunov’s method. 
For a given control volume T = lx; Jt Xj ph Riemann 
problems are solved at each interface x, 1/2. For example, 
ai = iar %)41/2» the Riemann problem counterpart of 
a, 


8,0; 41/28) T) + Og f (Wj 4128.) =0 in RxRt 
for wi 4126, t) € R with initial data 


u, ifg<0 
YEO = ty fe>0 
is solved either exactly or approximately. From this local 
solution, a single unique numerical flux at x,, 1). is com- 
puted from g(u;,u;,1) = f (W;41/2(0, T > 0)). This con- 
struction utilizes the fact that the solution of the Riemann 
problem at § = 0 is a constant for all time t > 0. 

In higher-space dimensions, the flux integral appearing 
in (15) is similarly approximated by 


L Ejj uy) 


Vejn 37; 


f fu) wa (18) 
aT, 


where the numerical flux is assumed to satisfy the proper- 
ties: 


è (Conservation) This property ensures that fluxes from 
adjacent control volumes sharing a mutual interface 
exactly cancel when summed. This is achieved if the 
numerical flux satisfies the identity 


8j, v) = ~g V, u) (19a) 
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e (Consistency) Consistency is obtained if the numerical 
flux with identical state arguments reduces to the true 
total flux passing through e; of that same state, that is, 


8 jeU, u) = f fa). dv (19b) 


Combining (17) and (18) yields perhaps the simplest 
finite volume scheme in semidiscrete form. Let V? denote 
the space of piecewise constants, that is, 

Ve = {v | viz € X(T), 


YTEeT)} (20) 


with x (T) a characteristic function in the control volume T. 


Definition 6 (Semidiscrete finite volume method) The 
semidiscrete finite volume approximation of (la, 1b) utiliz- 
ing continuous in time solution representation, t € [0, 00), 
and piecewise constant solution representation in space, 
u(t) € VP, such that 


1 


u,(t) = mf, to t)dx 


with initial data 
u;(0) : Í ug(x) dx 
P = — x 
5i IT; | T 0 


and numerical flux function gj,(u;,u,) is given by the fol- 
lowing system of ordinary differential equations 


d i 
ait È Up=, YeT (21) 


IT; | Vejx€0T; 


This system of ordinary differential equations can be 
marched forward using a variety of explicit and implicit 
time integration methods. Let uF denote a numerical 
approximation of the cell average solution in the control 
volume T, at time t" =nAt. A particularly simple time 
integration method is the forward Euler scheme 


mel ot 
LRG ee 
dt / At 
thus producing a fully discrete finite volume form. 


Definition 7 (Fully discrete finite volume method) The 
fully discrete finite volume approximation of (1a, 1b) for the 
time slab interval [t", t" + At] utilizing Euler explicit time 
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advancement and piecewise constant solution representa- 
tion in space, uł} € Vp, such that 


uj = Hi i uj, (x) dx 
with initial data for n = 0 
uy = = { ug (x) dx 
IT| Jy 
and numerical flux function g; (uj, ug) is given by the 
following fully discrete system 
At 


iT SO gaut, u), Wek Q2 
J! Ve 


jk EAT} 


with = 


n 
J uj 


Once space-time maximum principle properties of this 
fully discrete form are ascertained, Section 4.1 shows how 
higher-order accurate time integration schemes can be con- 
structed that preserve these properties, albeit with a differ- 
ent time step restriction. 


2.1.1 Monotone schemes 


Unfortunately, the numerical flux conditions (19a) and 
(19b) are insufficient to guarantee convergence to entropy 
satisfying weak solutions (12) and additional numerical 
flux restrictions are necessary. To address this deficiency, 
Harten, Hyman and Lax (1976) provide the following 
result concerning the convergence of the fully discrete one- 
dimensional scheme to weak entropy satisfying solutions. 


Theorem 3 (Monotone schemes and weak solutions) 
Consider a I-D finite volume discretization of (1a, 1b) with 
2k + 1 stencil on a uniformly spaced mesh in both time 
and space with corresponding mesh spacing parameters At 
and Ax 


+1 
u; _ H; (Ujap -Uj see jy) 
n At 
= Uy — Gy Bist T 8-1) (23) 
and consistent numerical flux of the form 
TEN ™ SU veer Uy ys Uy rey Wiky) 


that is monotone in the sense 


Flee Visk (4 
zs 20, < ) 
Djy 


Assume that u; converges boundedly almost everywhere to 
some function u(x,t), then as At and Ax tend to zero with 


At/Ax = constant, this limit function u(x, t) is an entropy 
satisfying weak solution of (1a, Ib). 


Note that this theorem assumes convergence in the limit, 
which was later proven to be the case in multidimensions 
by Crandall and Majda (1980). 

The monotonicity condition (24) motivates the introduc- 
tion of Lipschitz continuous monotone fluxes satisfying 


98 is1/2 
SH! 59 if lai 
yy i Fi (25a) 
dg, 
TB co WEG (25b) 
Ou, 


together with a CFL (Courant-Friedrichs-Levy) like condi- 


tion 
a ð 
1 At f 98j4172 — 98j-1/2 zo 
Ax du; au; 


so that (24) is satisfied. Some examples of monotone fluxes 
for (1a) include 
e (Godunov flux) 

ae gl if uj <u; 
jp F max 


uellu uj] 


e (Lax—Friedrichs flux) 


+1 


fu) ìf Uj > jyy (26) 


giin = ACF) + fF Gj.) 
-4 sup PO) 2D 


ueluj,uj} 


2.1.2 E-flux schemes 


Another class of monotone numerical fluxes aris- 
ing frequently in analysis was introduced by Osher 
(1984). These fluxes are called E-fluxes, g; = 
SPU pags «+s pats Myr «+++ Uj_pya), due to the relationship 
to Oleinick’s well-known E-condition which character- 
izes entropy satisfying discontinuities. E-fluxes satisfy the 
inequality 


Bap ~ fu) z 


Myre — Uy 


0, We lu;, uja] (28) 


E-fluxes can be characterized by their relationship to 

Godunov’s flux. Specifically, E-fluxes are precisely those 
fluxes such that 

B yn See if u, 29a) 

8j412 Í Bria Wj <4; (29a 


Biaya Z Ship if Mja > Uy (29b) 
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Viewed another way, note that any numerical flux can be 
written in the form 


S12 = iu) + fuj) - AUAN —4,) (30) 


where Q(-) denotes a viscosity for the scheme. When 
written in this form, E-fluxes are those fluxes that contribute 
at least as much viscosity as Godunov’s flux, that is, 


OPa = Qin GD 


The most prominent E-flux is the Enquist- Osher flux 
1 1 pun 
= (fetua) -3 Olds G2) 
mM), 


although other fluxes such as certain forms of Roe’s flux 
with entropy fix fall into this category. From (29a, 29b), the 
monotone fluxes of Godunov g#,,,. and Lax—Friedrichs 
gt j2 are also E-fluxes. 


2.2 Stability, convergence, and error estimates 


Several stability results are provided here that originate 
from discrete maximum principle analysis and are straight- 
forwardly stated in multidimensions and on general unstruc- 
tured meshes. In presenting results concerning convergence 
and error estimates, a notable difference arises between one 
and several space dimensions. This is due to the lack of a 
BV bound on the approximate solution in multidimensions. 
Thus, before considering convergence and error estimates 
for finite volume methods, stability results are presented 
first together with some a priori bounds on the approximate 
solution. 


2.2.1 Discrete maximum principles and stability 


A compelling motivation for the use of monotone fluxes 
in the finite volume schemes (21) and (22) is the obtention 
of discrete maximum principles in the resulting numerical 
solution of nonlinear conservation Jaws (ia). A standard 
analysis technique is to first construct local discrete maxi- 
mum principles which can then be applied successively to 
obtain global maximum principles and stability results. 

The first result concerns the boundedness of local 
extrema in time for semidiscrete finite volume schemes that 
can be written in nonnegative coefficient form. 


Theorem 4 (LED property) The semidiscrete scheme for 
each T, € T 


du. 
Sip d So Celup- 4) (33) 


Vejgeð T; 


is local extremum diminishing (LED), that is, local maxima 
are nonincreasing and local minima are nondecreasing, if 


Ch) Z0, Vey, € OT, (34) 
Rewriting the semidiscrete finite volume scheme (21) in the 
following equivalent forms 
du, 


J 
ies 
eo m bD Ejk (Uj, Ug) 
J. VejeðTy 


P E Eee TU Monu) 
J 


IT; Vep ET; d Hi 
1 dg, es 
ee oe Sik wy, Bj) (uy — u;) (35) 
IZ | V¥ejx€0T} dug 


for appropriately chosen ğ kE [u Ly, U gl reveals that the 
monotone flux condition (25a, 25b) is a sufficient condition 
for the semidiscrete scheme to be LED. To obtain local 
space-time maximum principle results for the fully discrete 
discretization (22) requires the introduction of an additional 
CFL-like condition for nonnegativity of coefficients in 
space-time. 


Theorem 5 (Local space-time discrete maximum prin- 
ciple) The fully discrete scheme for the time slab increment 
[t”, "+"] and each TéT 


At i 
uati ut 4- ral So Cu ug - 49) (36) 
J! vej,€8T 


exhibits a local space-time discrete maximum principle for 
eachn=0,1,... 


in (u7, u”) < ut! < ny? 37 
anin (“e uj) < uj < max Uk uj) 37) 

if 
Cuh) 20, Vey, € ôT, (38) 


and At is chosen such that the CFL-like condition is 
Satisfied 


= YS) ed 69) 


I3 Vey €9T; 


Again noting that the flux terms in the fully discrete finite 
volume scheme (22) can be written in the form (35) reveals 
that the monotone flux conditions (25a, 25b) together with 
a local CFL-like condition obtained from (39) imply a local 
space-time discrete maximum principle. By successive 
application of Theorem 5, a global L°°-stability bound is 
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obtained for the scalar initial value problem (la, 1b) in 
terms of initial data u(x). 


Theorem 6 (L®-stability) Assume a fully discrete finite 
volume scheme (22) for the scalar initial value problem (1a, 
Ib) utilizing monotone fluxes satisfying a local CFL-like 
condition as given in Theorem 5 for each time slab incre- 
ment [t", t"+1]. Under these conditions, the finite volume 
scheme is L®-stable and the following estimate holds: 


inf up(x) < u; < sup u(x), 
xeR¢ xeRé 
for all T, t") € Tx [0, t] (40) 


Consider now steady state solutions, u"+! =u" =u*, 
using a monotone flux in the fully discrete finite volume 
scheme (22). At steady state, nonnegativity of the coeffi- 
cients C(u,,) in (36) implies a discrete maximum principle. 


Theorem 7 (Local discrete maximum principle in 
space) The fully discrete scheme (36) exhibits a local dis- 
crete maximum principle at steady state, uj, for each T, € T 


, * * * 
min up <u; < max u (41) 
vaea © ~ T Vener, * 


Cy (uj) 20, Vey € OT; 


Once again, by virtue of (25a, 25b) and (28), the condi- 
tions for a local discrete maximum principle at steady state 
are fulfilled by monotone flux finite volume schemes (22). 
Global maximum principles for characteristic boundary val- 
ued problems are readily obtained by successive application 
of the local maximum principle result. 

The local maximum principles given in (37) and (41) 
preclude the introduction of spurious extrema and O(1) 
Gibbs-like oscillations that occur near solution disconti- 
nuities computed using many numerical methods (even in 
the presence of grid refinement), For this reason, discrete 
maximum principles of this type are a highly sought after 
design principle in the development of numerical schemes 
for nonlinear conservation laws. 


2.2.2 Convergence results 


The L%™-stability bound (40) is an essential ingredient 
in the proof of convergence of the fully discrete finite 
volume scheme (22). This bound permits the subtraction 
of a subsequence that converges against some limit in the 
L weak-starred sense. The primary task that then remains 
is to identify this limit with the unique solution of the 
problem. So although L®-stability is enough to ascertain 


convergence of the scheme, stronger estimates are needed 
in order to derive convergence rates. 

Let BV denote the space of functions with bounded 
variation, that is, 


BV=[ge L'®)|Iglav < co} 


with 
[slay = sup [29-00 
geci YR 
Helleo<1 


From the theory of scalar conservation laws, it is known 
that, provided the initial data is in BV, the solution remains 
in BV for all times. Therefore, it is desirable to have an 
analog of this property for the approximate solution as well. 
Unfortunately, up to now, such a result is only rigorously 
proved in the one-dimensional case or in the case of tensor 
product Cartesian meshes in multiple space dimensions. In 
the general multidimensional case, the approximate solution 
can only be shown to fulfill some weaker estimate, which is 
thus called a weak BV estimate; see Vila (1994), Cockburn, 
Coquel and Lefloch (1994), and Eymard et al. (1998). 


Theorem 8 (Weak BV estimate) Let T be a regular 
triangulation, and let J be a uniform partition of [0, ul, for 
example, At" = At. Assume that there exists some a > 0 
such that ah? < |T,|, |9T,| < h. For the time step At", 
assume the following CFL-like condition for a given § € 
(0, 1) 
2 
m< (1 — §)a°h 


Lz 


A 


where L, is the Lipschitz constant of the numerical flux func- 
tion. Furthermore, let ug € L” (R$) N BV(R4) N L RS). 
Then, the numerical solution of the fully discrete discretiza- 


tion (22) fulfills the following estimate 5 
y At Pyyhe; — uf | Qj, ur) S Ky T|Bryn [Va 
n jl 


(42) 
where K only depends on a, L,, § and the initial function 
uo. In this formula Q; is defined as 


2g; (u, V) — 8; (u, u) — 8, v) 


u—v 


Oj (u,v) = 


and xj; denotes the discrete cutoff function on BRO) C R, 
that is, 


eae ne if (T, UT) N Bp) #9 
Xj = 0, else y 


Note that in the case of a strong BV estimate, the right-hand 
side of (42) would be O(h) instead of O(./h). 

Another important property of monotone finite volume 
schemes is that they preserve the L!-contraction property 
(see Theorem 1). 


Theorem 9 (Z'-contraction property and Lipschitz esti- 
mate in time) Let u, v, € VÈ be the approximate mono- 
tone finite volume solutions corresponding to initial data ug, 
Vg assuming that the CFL-like condition for stability has 
been fulfilled. Then the following discrete L'-contraction 
property holds 


4p t+) -— t+ Olle S 


lun, t) = uC, Hll 


Furthermore, a discrete Lipschitz estimate in time is 
obtained 


DIDI — aj] < Ly At” J 9 lenlu} — ut] 
J t 


j 


The principle ingredients of the convergence theory for 
scalar nonlinear conservation laws are compactness of the 
family of approximate solutions and the passage to the 
limit within the entropy inequality (12). In dealing with 
nonlinear equations, strong compactness is needed in order 
to pass to the limit in (12). In one space dimension, due 
to the BV estimate and the selection principle of Helly, 
strong compactness is ensured and the passage to the limit 
is summarized in the well-known Lax—Wendroff theorem; 
see Lax and Wendroff (1960). 


Theorem 10 (Lax—Wendroff theorem) Let (u,,)men be 
a sequence of discrete solutions defined by the finite volume 
scheme in one space dimension with respect to initial data 
Ug. Assume that (u m)men is uniformly bounded with respect 
to m in L™ and u,, converges almost everywhere in R x Rt 
against some function u. Then u is the uniquely defined 
entropy weak solution of (la, 1b). 


With the lack of a BV estimate for the approximate 
solution in multiple space dimensions, one cannot expect a 
Passage to the limit of the nonlinear terms in the entropy 
inequality in the classical sense, that is, the limit of Um 
will not in general be a weak solution. Nevertheless, the 
weak compactness obtained by the L°-estimate is enough 
to obtain a measure-valued or entropy process solution in 
the limit. 

The key theorem for this convergence result is the 
following compactness theorem of Tartar; see Tartar (1983) 
and Eymard, Galluoét and Herbin (2000). 
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Theorem 11 (Tartar’s theorem) Let (u,,) men be a family 
of bounded functions in L™(R"). Then, there exists a 
subsequence (Un )meny and a function u € L°(R" x (0, 1)) 
such that for all functions g € C(R) the weak-x limit of 
&(u,,) exists and 


dim, f euna) = 


Į 
Í Í EUC, a(x) drda, forall > € LR") (43) 


In order to prove the convergence of a finite volume 
method, it now remains to show that the residual of the 
entropy inequality (12) for the approximate solution u, 
tends to zero if h and At tend to zero. Before presenting 
this estimate for the finite volume approximation, a general 
convergence theorem is given, which can be viewed as 
a generalization of the classical Lax—Wendroff result; see 
Eymard, Galluoét and Herbin (2000). 


Theorem 12 (Sufficient condition for convergence) Let 
ug E€ L©(R*) and f € CIR). Further, let (Um)men be any 
family of uniformly bounded functions in LOR! x RY) 
that satisfies the following estimate for the residual of the 
entropy inequality using the class of Kruzkov entropies n, 
(see Note 1). 


i 1 (met), + Fy. Up) vo) dt dx 
+ Í 1. Ug) d(x, 0) dx = —R(k, tm $) (44) 


for all x € R and p € CÈR? x Rt, R+) where the residual 
R{k, up P) tends to zero for m > oo uniformly in x. Then, 
Uum converges strongly to the unique entropy weak solution 
of (1a, Ib) in LP (R? x R*+) for all p € [1, 00). 


Theorem 13 (Estimate on the residual of the entropy 
inequality) Let (U,,)men be a sequence of monotone finite 
volume approximations satisfying a local CFL-like con- 
dition as given in (39) such that h, At tend to zero for 
m —> oo. Then, there exist measures W, € MR? x Rt) 
and v,, € M(R*) such that the residual R(K, ty, ) of the 
entropy inequality is estimated by 


Rium) < f f 18,0081 + IVC DD dun 
, 0) d 
+ [9620 dy) 


for all € Rand o e CER? x Rt, R*). The measures Wm 
and v,, satisfy the following properties: 
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1. For all compact subsets Q CCR? x R+, 
limpo Vn (2) = 0. 

2. For all g€ CR?) the measure v,, is given by 
(pps 8) = fae 8) |g (X) — ty (X, O)| dx. 


These theorems are sufficient for establishing convergence 
of monotone finite volume schemes. 


Corollary 1 (Convergence theorem) Let (,,) men be a 
sequence of monotone finite volume approximations satis- 
fying the assumptions of Theorem 13. Then, u,, converges 
strongly to the unique entropy weak solution of (la, 1b) in 
LP CR? x R+) for all p € [1, œ). 


Convergence of higher-order finite volume schemes can 
also be proven within the given framework as long as they 
are L™-stable and allow for an estimate on the entropy 
residual in the sense of Theorem 13; for details see Kröner, 
Noelle and Rokyta (1995) and Chainais-Hillairet (2000). 


2.2.3 Error estimates and convergence rates 


There are two primary approaches taken to obtain error 
estimates for approximations of scalar nonlinear conserva- 
tion laws. One approach is based on the ideas of Oleinik 
and is applicable only in one space dimension; see Oleinik 
(1963) and Tadmor (1991). The second approach, which is 
widely used in the numerical analysis of conservation laws 
is based on the doubling of variables technique of Kruzkov; 
see Kruzkov (1970) and Kuznetsov (1976). In essence, 
this technique enables one to estimate the error between 
the exact and approximate solution of a conservation law 
in terms of the entropy residual R(k, u,,, P) introduced 
in (44). Thus, an a posteriori error estimate is obtained. 
Using a priori estimates of the approximate solution (see 
Section 2.2.1, and Theorems 8, 9), a convergence rate or an 
a priori error estimate is then obtained. The next theorem 
gives a fundamental error estimate for conservation laws 
independent of the particular finite volume scheme; see 
Eymard, Galluoét and Herbin (2000), Chainais-Hillairet 
(1999), and Kröner and Ohiberger (2000). 


Theorem 14 (Fundamental error estimate) Let uo € 
BV(R‘) and let u be an entropy weak solution of (la, 
1b). Furthermore, let v e L®(R? x R*) be a solution of 
the following entropy inequalities with residual term R: 


+ f nels.) -Ren 69 
pa 


forall x € Rand» € CLR! x R+, R*). Suppose that there 
exist measures p, E€ M(R¢ x Rt) and v, € M(R®) such 


that R(x, v, ġ) can be estimated independently of x by 


RG v, 6) < (13ol + |V], by) + (OC, OL v,) (46) 


Let K CCR! x R+, w = Lip(f), and choose T, R and 
Xo such that T €]0,(R/w)[ and K lies within its cone of 
dependence Do, that is, K C Dg where Dy, is given as 


D; := |] Brors) x {t} (47) 


O<tsT 


Then, there exists a8 > 0 and positive constants C,, C, such 
that u, v satisfy the following error estimate 


lu — vlizg ST (P Brno) + Cyp,(Ds) 


F CuB) (48) 


This estimate can be used either as an a posteriori control 
of the error, as the right-hand side of the estimate (48) only 
depends on v, or it can be used as an a priori error bound if 
one is able to estimate further the measures p, and v, using 
some a priori bounds on v. Finally, note that comparable 
estimates to (48) are obtainable in an L™(0, T; L'(R?))- 
norm; see Cockburn and Gau (1995) and Bouchut and 
Perthame (1998). 


2.2.4 A posteriori error estimate 


Theorem 15 (A posteriori error estimate) Assume the 
conditions and notations as in Theorem 14. Let v = u, be 
a numerical approximation to (la, 1b) obtained from a 
monotone finite volume scheme that satisfies a local CFL- 
like condition as given in (39). Then the following error 
estimate holds 


f. ju — ul $ T (lixo — 44,6, DlltiBra a 
+Cin +CV) (49) 


where 


n= L y jug t! =u; Athi 


nelb jeM(t") 
+250 O APA +h) 
nGh (j,DEE(t") 
x Uj up) |u; — u| 
2g; (u, v)—8; (u, u) — 8; (v, v) 
u— v 


Q (u, v) = 


with the index sets Iy, M(t), E(t) given by 


Ips frios” <min{==*, r}} 
w 
M(t) = {j | there exists x € T; 
such that (x,t) € Dri} 
E(t) = {(j,]) | there exists x € T, UT, 


such that (x,t) € Dras} 


Furthermore, the constants C,, C, only depend on T, , 
“oll ay and lluo||,~; for details see Kröner and Ohlberger 
(2000). 


Note that this a posteriori error estimate is local, since the 
error on a compact set K is estimated by discrete quantities 
that are supported in the cone of dependence Dp. 5. 


2.2.5 A priori error estimate 


Using the weak BV estimate (Theorem 8) and the Lipschitz 
estimate in time (Theorem 9), the right-hand side of the a 
posteriori error estimate (49) can be further estimated. This 
yields an a priori error estimate as stated in the following 
theorem; for details see Cockburn and Gremaud (1996), 
Cockburn and Gremaud (1997), Cockburn, Gremaud and 
Yang (1998), Chainais-Hillairet (1999), and Eymard, Gal- 
luoét and Herbin (2000). 


Theorem 16 (A priori error estimate) Assume the con- 
ditions and notations as in Theorem 14 and let v = u, be 
the approximation to (1a), (1b) given by a monotone finite 
volume scheme that satisfies a local CFL-like condition as 
given in (39). Then there exists a constant C > 0 such that 


f ju — up| dx < Ch! 
K 


Moreover, in the one-dimensional case, the optimal conver- 
gence rate of h!!? is obtained. 


2.2.6 Convergence proofs via the streamline 
diffusion discontinuous Galerkin finite element 
method 


It is straightforward to show that the fully discrete finite 
volume scheme (22) can be viewed as a specific case of the 
more general streamline diffusion discontinuous Galerkin 
(SD-DG) finite element method which utilizes the mesh 
dependent broken space V? defined as 


VP = {v| vlr €P,(T), vr eT} (50) 
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with P,(T) the space of polynomials of degree < p 
in the control volume T. By generalizing the notion 
of gradient and flux to include the time coordinate as 
well, the discontinuous Galerkin finite element method 
for a space-time tessellation 7” spanning the time slab 
increment [t”, 2”*+1] is given compactly by the following 
variational statement. 


SD-DG{(p) finite element method. 
Find u, € V? such that Vu, € V? and n =0,1,... 


D (fort saps) VaV fdas 


TeT" 


+ [wy Vu, Vy, ax 
T 


+ Í, Ur- (EÔ; n Man) — S Ua 5) as) =0 


(31) 
where 6 denotes the unit exterior normal on 3T. In the 
integration over OT, it is understood that for the portion 
x € ƏT NƏT” that v_p, v_, denotes the trace restriction 
of u,,(T) and v,(T) onto dT and u, , denotes the trace 
restriction of u,(7’) onto 7’. Given this space-time for- 
mulation, convergence results for a scalar nonlinear conser- 
vation law in multidimensions and unstructured meshes are 
given in Jaffre, Johnson and Szepessy (1995) for specific 
choices of the stabilization functions 8(u,,): R > R+ and 
e(u,): R > R* together with a monotone numerical flux 
function g(Ŷ, u_,,4,,,)- Using their stabilization functions 
together with a monotone fiux function, the following con- 
vergence result is obtained: 


Theorem 17 (SD-DG(p) convergence) Suppose that 
components of f'(u) e C4(R) are bounded and that uy € 
L,(R*) has compact support. Then the solution u, of 
the SD-DG(p) method converges strongly in Lert x 
Rt), 1< p <2, to the unique solution u of the scalar 
nonlinear conservation law system (la, lb) as H = 
max(|jhliz y At) tends to zero. 


The proof of convergence to a unique entropy solution 
on general meshes for p > 0 is based on an extension by 
Szepessy (1989) of a uniqueness result by DiPerna (1985) 
by providing convergence for a sequence of approximations 
satisfying: 


e auniform L,, bound in time and L, in space, 

e entropy consistency and inequality for all Kruzkov 
entropies, 

e consistency with initial data. 


By choosing SD-DG(0), the dependence on the as yet 
unspecified stabilization functions 5(u,,) and €(#,,) vanishes 
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identically and the fully discrete scheme (22) with mono- 
tone flux function is exactly reproduced, thus yielding a 
convergence proof for general scalar conservation laws for 
the finite volume method as well. Subsequently, Cockburn 
and Gremaud (1996) replaced the L° (L?) norm analysis of 
Jaffre, Johnson and Szepessy (1995) with an alternate anal- 
ysis in L®(L}) thus yielding O(h'/*) and Oth") error 
estimates in time and space respectively. 


3 HIGHER-ORDER ACCURATE FV 
GENERALIZATIONS 


Even for linear advection problems where an O(h?) 
L,-norm error bound for the monotone flux schemes of 
Section 2 is known to be sharp (Peterson, 1991), an 
O(h) solution error is routinely observed in numerical 
experiments on smoothly varying meshes with convex flux 
functions. Nevertheless, first-order accurate schemes are 
generally considered too inaccurate for most quantitative 
calculations unless the mesh spacing is made excessively 
small, thus rendering the schemes inefficient. Godunov 
(1959) has shown that all linear schemes that preserve 
solution monotonicity are at most first-order accurate. The 
low-order accuracy of these monotonicity preserving linear 
schemes has motivated the development of higher-order 
accurate schemes with the important distinction that these 
new schemes utilize essential nonlinearity so that monotone 
resolution of discontinuities and high-order accuracy away 
from discontinuities are simultaneously attained. 


3.1 Higher-order accurate FV schemes in 1-D 


A significant step forward in the generalization of Godu- 
nov’s finite volume method to higher-order accuracy is due 
to van Leer (1979). In the context of Lagrangian hydro- 
dynamics with Eulerian remapping, van Leer generalized 


(c) 


Figure 3. Piecewise polynomial approximation used in the finite 
volume method (a) cell averaging of analytic data, (b) piece- 
wise linear reconstruction from cell averages and (c) piecewise 
quadratic reconstruction from cell averages. 


Godunov’s method by employing linear solution recon- 
struction in each cell (see Figure 3b). Let N denote the 
number of control volume cells in space so that the jth 
cell extends over the interval T; = [%)—1/2 *j41/2] with 
length Ax; such that Urejen Tj = [0, 1] with T; N T, =, 
i % j. Ina purely Eulerian setting, the higher-order accurate 
schemes of van Leer are of the form 


du; 1 - + = + 

ev TS Wym Wyd) — 8ye Hap) =O 
where g(u, v) is a numerical flux function utilizing states 
u74 and uf, Obtained from evaluation of the linear 
solution reconstructions from the left and right cells sur- 
rounding the interfaces x).1/2, By altering the slope of the 
linear reconstruction in cells, nonoscillatory resolution of 
discontinuities can be obtained. Note that although obtain- 
ing the exact solution of the scalar nonlinear conservation 
law with linear initial data is a formidable task, the solu- 
tion at each cell interface location for small enough time 
is the same as the solution of the Riemann problem with 
piecewise constant data equal to the linear solution approx- 
imation evaluated at the same interface location. Conse- 
quently, the numerical flux functions used in Section 2 
can be once again used in the generalized schemes of 
van Leer. This single observation greatly simplifies the 
construction of higher-order accurate generalizations of 
Godunov’s method. This observation also suggested a rel- 
atively straightforward extension of van Leer ideas to 
quadratic approximation in each cell (see Figure 3c) as dis- 
cussed in early work by Colella and Woodward (1984). 
Although these generalizations of Godunov’s method and 
further generalizations given later can be interpreted in 
1-D as finite difference discretizations, concepts originally 
developed in 1-D, such as solution monotonicity, positive 
coefficient discretization, and discrete maximum principle 
analysis are often used in the design of finite volume 
methods in multiple space dimensions and on unstructured 
meshes where finite difference discretization is problematic. 


3.1.1 TVD schemes 


In considering the scalar nonlinear conservation law (la, — 


1b), Lax (1973) made the following basic observation: 


the total increasing and decreasing variations of a differ- 
entiable solution between any pair of characteristics are 
conserved 


Furthermore, in the presence of shock wave discontinuities, 
information is lost and the total variation decreases. For the 
1-D nonlinear conservation law with compactly supported 
(or periodic) solution data u(x,t), integrating along the 


constant time spatial coordinate at times t, and ż, yields 


fiiu f iue, >t, 6D 


This motivated Harten (1983) to consider the discrete total 
variation 


TV(u,) = NTA hk Ajaan = Uji ~My 
i 


and the discrete total variation nonincreasing (TVNI) bound 
counterpart to (52) 


TV (u+!) < TV (u3) (53) 


in the design of numerical discretizations for nonlinear 
conservation laws. A number of simple results relating 
TVNI schemes and monotone schemes follow from simple 
analysis. 


Theorem 18 (TVNI and monotone scheme properties, 
Harten, 1983) (i) Monotone schemes are TVNI. (ii) TVNI 
schemes are monotonicity preserving, that is, the number of 
solution extrema is preserved in time. 


Property (i) follows from the L,-contraction property of 
monotone schemes. Property (ii) is readily shown using a 
proof by contradiction, by assuming a TVNI scheme with 
monotone initial data that produces new solution data at 
a later time with interior solution extrema present. Using 
the notion of discrete total variation, Harten (1983) then 
constructed sufficient algebraic conditions for achieving the 
TVNI inequality (53). 


Theorem 19 (Harten’s explicit TVD criteria) The fully 
discrete explicit 1-D scheme 


nel n 
uj =} + At (Capt A ai 
+ Djan DAt) Fm ern SD 


is total variation nonincreasing if for each j 


Cup 20 (55a) 
Da2 <0 (55b) 
1- At (Ciz — Djan) = 0 650) 


Note that although the inequality constraints (55a—55c) in 
Theorem 19 insure that the total variation is nonincreasing, 
these conditions are often referred to as total variation 
diminishing (TVD) conditions. Also note that inequality 
(S5c) implies a CFL-like time step restriction that may be 
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more restrictive than the time step required for stability 
of the numerical method, The TVD conditions are easily 
generalized to wider support stencils written in incremental 
form; see, for example, Jameson and Lax (1986) and their 
corrected result in Jameson and Lax (1987). 

While this simple Euler explicit time integration scheme 
may seem too crude for applications requiring true high- 
order space-time accuracy, special attention and analysis 
is given to this fully discrete form because it serves as a 
fundamental building block for an important class of high- 
order accurate Runge-Kutta time integration techniques 
discussed in Section 4.1 that, by construction, inherit TVD 
(and later maximum principle) properties of the fully 
discrete scheme (54). 


Theorem 20 (Generalized explicit TVD criteria) The 
fully discrete explicit 1-D scheme 


kd 
atl oon ad) n nm os 
uy uy + At ò Cji Uh Ajue J= hbe N 


l=—k 

(56) 
with integer stencil width parameter k > 0 is total variation 
nonincreasing if for each j 


(e—1) 


ie ess (57a) 
—k) 
12 30 (57b) 
ri 

AP cO p20 —k+1<i<k-1,1#0 
67o) 

(0) ~1 
1= At (C19 CR) = 0 (57d) 


The extension to implicit methods follows immediately 
upon rewriting the implicit scheme in terms of the solu- 
tion spatial increments Aj 414124, and imposing sufficient 
algebraic conditions such that the implicit matrix acting on 
spatial increments has a nonnegative inverse. 


Theorem 21 (Generalized implicit TVD criteria) The 
fully discrete implicit 1-D scheme 


k-1 

nol D n+l 1 

u? At $ Cipyy2h JA jpay2k" =u}, 
l=—k 


J=#h.uwN (58) 
with integer stencil width parameter k > Q is total variation 
nonincreasing if for each j 

wip ZO (59a) 
Cim <0 (59b) 
chh- cP —k+1sisk—-1,1#0 (59) 
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Theorems 20 and 21 provide sufficient conditions for non- 
increasing total variation of explicit (56) or implicit (58) 
numerical schemes written in incremental form. These 
incremental forms do not imply discrete conservation unless 
additional constraints are imposed on the discretizations. 
A sufficient condition for discrete conservation of the dis- 
cretizations (56) and (58) is that these discretizations can 
be written in a finite volume flux balance form 


kel 
as (2) 
8412 7 8-12 = SS Char Gn) Ajah 
l=—-k 


where ,11/2 are the usual numerical flux functions. 
Section 3.1.2 provides an example of how the discrete TVD 
conditions and discrete conservation can be simultaneously 
achieved. A more comprehensive overview of finite vol- 
ume numerical methods based on TVD constructions can 
be found the books by Godlewski and Raviart (1991) and 
LeVeque (2002). 


3.1.2 MUSCL schemes 


A general family of TVD discretizations with 5-point stencil 
is the monotone upstream-centered scheme for conserva- 
tion laws (MUSCL) discretization of van Leer (1979) and 
van Leer (1985). MUSCL schemes utilize a x-parameter 
family of interpolation formulas with limiter function 
Y(R): Re R 


l+k 


WR) Asi 2a 


up a aay (ž) Aj4ith 
té 


l-« 
- RD AY -1 2h (60) 


where R, is a ratio of successive solution increments 


A; u 
R, m PEA 61) 
Aj-12"h 


The technique of incorporating limiter functions to obtain 
nonoscillatory resolution of discontinuities and steep gradi- 
ents dates back to Boris and Book (1973). For convenience, 
the interpolation formulas (60) have been written for a uni- 
formly spaced mesh, although the extension to irregular 
mesh spacing is straightforward. The unlimited form of 
this interpolation is obtained by setting Y¥(R) = 1. In this 


unlimited case, the truncation error for the conservation law 
divergence in (la) is given by 


[k — (1/3)] 
4 


3 
Truncation Error = (Ax)? we fu) 
This equation reveals that for « = 1/3, the 1-D MUSCL 
formula yields an overall spatial discretization with O(Ax3) 
truncation error. Using the MUSCL interpolation formulas 
given in (60), sufficient conditions for the discrete TVD 
property are easily obtained. 


Theorem 22 (MUSCL TVD scheme) The fully discrete 
1-D scheme 


At A 
wY =u = ay EiT Bi j=l N 
3 
with monotone Lipschitz continuous numerical flux function 


= oun + 
8172 = Uj 1/2 Hara) 


utilizing the K-parameter family of MUSCL interpolation 
formulas (60) and (61) is total variation nonincreasing if 
there exists a U(R) such that WR € R 


1+k 
l-k 


0< Y(R) < 2 - (1+0) (62a) 


y 
os“ s240 (62b) 


witha e [-2,2 (1 — K)/(1 + K)] under the time step restric- 
tion 


1 At 2—Q2+a)k agp 
Ax; 1l-k ðu 7 
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ag max ag oy 
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For accuracy considerations away from extrema, it is 
desirable that the unlimited form of the discretization 
is obtained. Consequently, the constraint Y(1) = 1 is 
also imposed upon the limiter function. This constraint 
together with the algebraic conditions (62a, b) are readily 
achieved using the well-known MinMod limiter, ¥™™, 
with compression parameter $ determined from the TVD 


Table 1. Members of the MUSCL TVD family of schemes. 


K Unlimited scheme Bin. Truncation error 
1/3 Third-order 4 0 
-1 Fully upwind lari fu) 
0 Fromm’s 3 tav f(u) 
1/2 Low truncation error 5 -hati fa) 
analysis 

WMR) = maxi0, miar D), B e fi, E= 


Table 1 summarizes the MUSCL scheme and maximum 
compression parameter for a number of familiar discretiza- 


«tions. Another limiter duc to van Leer that meets the tech- 


nical conditions of Theorem 22 and also satisfies (1) = 1 
is given by 

R+|R| 

1+IR] 


WER) = 


This limiter exhibits differentiability away from R = 0, 
which improves the iterative convergence to steady state 
for many algorithms. Numerous other limiter functions are 
considered and analyzed in Sweby (1984). 

Unfortunately, TVD schemes locally degenerate to piece- 
wise constant approximations at smooth extrema, which 
locally degrades the accuracy. This is an unavoidable con- 
sequence of the strict TVD condition. 


Theorem 23 (TVD critical point accuracy, 
Osher, 1984) The TVD discretizations (54), (56) and (58) 
all reduce to at most first-order accuracy at nonsonic critical 
points, that is, points u* at which f'(u*) #0 and uz = 0. 


3.1.3 ENO/WENO schemes 


To circumvent the degradation in accuracy of TVD schemes 
at critical points, weaker constraints on the solution total 
variation were devised. To this end, Harten proposed the 
following abstract framework for generalized Godunov 
schemes in operator composition form (see Harten et al., 
1986, 1987; Harten, 1989) 


utt! = A. E(x) + ROC; uf) (63) 
In this equation, ux e V? denotes the global space of 


Piecewise constant cell averages as defined in (20), R(x) 
is a reconstruction operator, which produces a cell-wisc 
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discontinuous pth order polynomial approximation from the 
given solution cell averages, E(t) is the evolution operator 
for the PDE (including boundary conditions), and A is the 
cell averaging operator. Since A is a nonnegative operator 
and E(t) represents exact evolution in the small, the 
control of solution oscillations and Gibbs-like phenomena is 
linked directly to oscillation properties of the reconstruction 
operator, R? (x). One has formally in one space dimension 


TV(uR*!) = TV(A - E(t) - ROCs uf) < TVR, (x; u) 


so that the total variation depends entirely upon properties 
of the reconstruction operator REG: uj). The requirements 
of high-order accuracy for smooth solutions and discrete 
conservation give rise to the following additional design 
criterion for the reconstruction operator 


° RO; Up) = U(x) +e) Axt + O(Ax?*?) 


whenever u is smooth (64a) 
© Alr R(X; Hy) = Maly, = Hj, j=lh...N 
to insure discrete conservation (64b) 


e TV(R(x; ut) < TVR) + O(Ax?*) 
an essentially nonoscillatory reconstruction. (64c) 


Note that e(x) may not be Lipschitz continuous at certain 
points so that the cumulative error in the scheme is O(Ax?) 
in a maximum norm but remains O(Ax?t) in an L,- 
norm. To achieve the requirements of (64a~64c), Harten 
and coworkers considered breaking the task into two parts 


e Polynomial reconstruction from a given stencil of cell 
averages 

e Construction of a “smoothest” polynomial approxima- 
tion by a solution adaptive stencil selection algorithm. 

In the next section, a commonly used reconstruction tech- 

nique from cell averages is considered. This is then fol- 

lowed by a description of the solution adaptive stencil 

algorithm proposed by Harten et al. (1986). 


3.1.4 Reconstruction via primitive function 

Given cell averages u, of a piecewise smooth function 
u(x), one can inexpensively evaluate pointwise values of 
the primitive function U(x) 


uc) = fuod 


by exploiting the relationship 


j 
D Anu; = U jaya) 


i=j 
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Let H,(x;u) denote a pth order piecewise polynomial 
interpolant of a function u. Since 


d 
u(x) = Pra 


an interpolant of the primitive function given pointwise 
samples U(x;41/2) yields a reconstruction operator 


d 
Rp; Uy) = gg Hp 0 U) 
As a polynomial approximation problem, whenever U(x) 
is smooth one obtains 
dk dé p+l—k 
pase ‘U)=— k< 
aye C: U) Be tO ),0<k<p 
and consequently 
d d +1- 
ge up) = qar + O(AxP +) 
By virtue of the use of the primitive function U(x), it 
foliows that 
Alz Rp; up) =u; 
and from the polynomial interpolation problem for smooth 
data 
R(x; uy) = u(x) + O(AxP +?) 


as desired. 


3.1.5 ENO reconstruction 


The reconstruction technique outlined in Section 3.1.4 does 
not satisfy the oscillation requirement given in (64c). This 
motivated Harten and coworkers to consider a new algo- 
rithm for essentially nonoscillatory (ENO) piecewise poly- 
nomial interpolation. When combined with the reconstruc- 
tion technique of Section 3.1.4, the resulting reconstruc- 
tion then satisfies (64a—c). Specifically, a new interpolant 
H, (x; u) is constructed so that when applied to piecewise 
smooth data v(x) gives high-order accuracy 


at sv) = d v(x) + O(Ax?t-*) 0<k <p 
qe v) = ae we oe a 


but avoids having Gibbs oscillations at discontinuities in 
the sense 


TV(H, (x; v)) < TV (0) + Ox?) 


The strategy pursued by Harten and coworkers was to 
construct such an ENO polynomial H,(x; w) using the 
following steps. Define 
HENO(; w) = PE Ge w) for x, <x SX 
j=l,...,N 


where PENO, „ is the pth degree polynomial which inter- 
polates w(x) at the p+ 1 successive points {x;}, i,( j) z< 
i <i,U)+p that include x, and x;,;, that is, 


PENO, p(w) = w(x), O) Si SiG) p, 
1-psi (j) -j0 (65) 


Equation (65) describes p possible polynomials depending 
on the choice of ipj) for an interval (js Xj41) The 
ENO strategy selects the value i,(j) for each interval that 
produces the ‘smoothest’ polynomial interpolant for a given 
input data. More precisely, information about smoothness 
of w(x) is extracted from a table of divided differences of 
w(x) defined recursively fori = 1,...,N by 


w[x,] = w(x) 

wixa] — wlx] 

wit) X41] = Kaa ee 
i+ i 

WX ee Maal — WX, -- Xin 


Xing T Ži 


wy, -s and = 


The stencil producing the smoothest interpolant is then 
chosen hierarchically by setting 


h0) =j 
and forl<k<p-i 


if \wlx;,(y-1],--- wIX;,¢;) 44 


nes T 
a < WR yh. wet 


ir (j) = 
i(j otherwise 
O) (6) 
Harten et al. (1986) demonstrate the following properties 
of this ENO interpolation strategy 


e The accuracy condition 
Pan) = w(x) + O(AxPt), xe Œp xj) 


e PENO(x) is monotone in any cell interval containing 
a discontinuity. 


e There exists a function z(x) nearby PFN°(x) in the 
interval (x;,x;,,) in the sense 


2x) = PRS (x) FOCAXP*), x E (js kjg) 


that is total variation bounded, that is, the nearby 
function z(x) satisfies 


TVZ) <TV(w) 


3.1.6 WENO reconstruction 


The solution adaptive nature of the ENO stencil selection 
algorithm (66) yields nondifferentiable fluxes that impede 
convergence to steady state. In addition, the stencil selection 
algorithm chooses only one of p possible stencils and other 
slightly less smooth stencils may give similar accuracy. 
When w(x) is smooth, using a linear combination of 
all p stencils with optimized weights yields a more 
accurate ©O(Ax??-1) interpolant. More specifically, let 
Pe 1/2 denote the unique polynomial interpolating p + 1 
points with stencil {x41 pies %j+i4e} then 


p-t 
Pp jaa WED = DOP py jn) + OAD, 


k=0 


For example, optimized weights for p = 1,2,3 yielding 
O(Ax??-!) accuracy are readily computed 


p=1: @=1 
2 1 
p=2: Wg = Gai =e 
=3: aS oe 2È 
BSS bo TA T TO 


In the weighted essentially nonoscillatory (WENO) schemes 
of Jiang and Shu (1996) and Shu (1999), approximate 
weights, 6, are devised such that for smooth solutions 


By = op + OCAx?™) 
so that the O(Ax??-') accuracy is still retained using these 


approximations 


pri 
Foire) = X SP) + O(Ax?P71) 
k= 
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The approximate weights are constructed using the ad hoc 
formulas 


a Or & oe 
k » & = 
(+ B,)? Pal 
a 
k 
k=0 


where ¢ is an approximation to the square root of the 
machine precision and ĝ, is a smoothness indicator 


k-1 


Xj+1/2 d! pk (x) 2 
6, = i Axt-h fe |) ax 
k > ee j alx 


l=1 


For a sequence of smooth solutions with decreasing smooth- 
ness indicator B,, these formulas approach the optimized 
weights, @, —> œp. These formulas also yield vanishing 
weights @, — 0 for stencils with large values of the 
smoothness indicator such as those encountered at dis- 
continuities. In this way, the WENO construction retains 
some of the attributes of the original ENO formulation but 
with increased accuracy in smooth solution regions and 
improved differentiability often yielding superior robust- 
ness for steady state calculations. 


3.2 Higher-order accurate FV schemes in 
multidimensions 


Although the one-dimensional TVD operators may be 
readily applied in multidimensions on a dimension-by- 
dimension basis, a result of Goodman and LeVeque (1985) 
shows that TVD schemes in two or more space dimensions 
are only first-order accurate. 


Theorem 24 (Accuracy of TVD schemes in multidimen- 
sions) Any two-dimensional finite volume scheme of the 


form 


At A 


unt =u iaag — aan 
IT; | 
RE sieset 1<i<M,1<j<N 
iT, 7! Ra yap 1sisM, lsjs 
Lj 


with Lipschitz continuous numerical fluxes for integers 
PGT, Ss 


8i41/2,7 = BCs pj—gr + Migr, jts) 
hi jj = RU jp jg +) Mine jas) 


that is total variation nonincreasing in the sense 


TV(uRt}) < TV (ur) 
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where 


TV@m=)> [Aye Pisa; =y 


Lj 
FAX 551/21 j1 7 myl] 
is at most first-order accurate. 


Motivated by the negative results of Goodman and LeV- 
eque, weaker conditions yielding solution monotonicity 
preservation have been developed from discrete maximum 
principle analysis. These alternative constructions have the 
positive attribute that they extend to unstructured meshes 
as well. 


3.2.1 Positive coefficient schemes on structured 
meshes 


Theorem 5 considers schemes of the form 


At 
u”t! = uy +— 


r D Ceupug- u), VT, eT 


| zl VejkeðT; 


and provides a local space-time discrete maximum princi- 
ple 

ae (iks u) < unt! < nan, Ui uj) 
VT; €T under a CFL-like condition on the time step 
parameter if all coefficients C;, are nonnegative. Schemes 
of this type are often called positive coefficient schemes or 
more simply positive schemes. To circumvent the negative 
result of Theorem 24, Spekreijse (1987) developed a family 
of high-order accurate positive coefficient schemes on two- 
dimensional structured M x N meshes. For purposes of 
positivity analysis, these schemes are written in incremental 
form on a M x N logically rectangular 2-D mesh 


n+l oon n n n 
upp = ups + At (Afa Gigi yj — 4g) 
n n n n n n 
+ Bija Uja T) + Cr Gig — Hig) 


+D?. 


Piel jy — uh), 1<i<M,1<jsN 


(67) 
where the coefficients are nonlinear functions of the solu- 
tion 


m n noun 

Alyy = AG. Map Mey Mina pee 
n = n n n 

Bi j+ = Bla Ui jap Pij Hijr 


n — n n n 
iag 5 CCo Mia pe ME js Mago 


we Se ew YS 


n tn n n n 
Dij- 5 DG. ui j-i Maye Wijt 


Once written in incremental form, the following theorem 
follows from standard positive coefficient maximum prin- 
ciple analysis. 


Theorem 25 (Positive coefficient schemes in multidi- 
mensions) The discretization (67) is a positive coefficient 
scheme if for each 1 <i <M, 1< j <N and time slab 
increment [t", t"+1] 


fej = 0, Brit 20, Cy; =9, Dij >0 (68) 


and 
1- At (AN, + Bij + Chag FD) 20 (69) 


with discrete space—time maximum principle 


: n n n n n aml 
min(u; Mina p Mia p Vij- Wija) S Uig 


n n n n n 
< max (Mj p Wi-j Mig p Hijr Mie) 
and discrete maximum principle at steady state 


rf * * ae * 
minus) j Mipi 47 j-p Hija) S 


* * * * * 
ujj < max(uj jp Wip ME j—v Hija) 
where u* denotes the numerical steady state. 


Using a procedure similar to that used in the development of 
MUSCL TVD schemes in 1-D, Spekreijse (1987) developed 
a family of monotonicity preserving MUSCL approxima- 
tions in multidimensions from the positivity conditions of 
Theorem 25. 


Theorem 26 (MUSCL positive coefficient scheme) 
Assume a fully discrete 2-D finite volume scheme 


At 
1 
WJ =T T (Bizi T Biia) 
At, , : y 
= in| ete -hiji lstisM,1sj<Nn 
LJ 


utilizing monotone Lipschitz continuous numerical flux func- 
tions 


ae + 
812 = 8 Uyap Hi2) 


=h + 
h; jap = AU jase Wijaya) 


and MUSCL extrapolation formulas 


a I 
Uiii = uj G a Pi Gj = uij) 


tefl 
Mina, =i — 5 (5) Giggy 744, 


ij 
or 1 
Ui pata = Mij + ZG; = 4; j-) 
1 1 
ue jay = Hij T ha (5) (ti 541 T Mi) 
with 


THT we te aa MaS 
ij= ae 2 Se a 
a aa a u; j T Hij- 


This scheme satisfies the local maximum principle prop- 
erties of Lemma 25 and is second-order accurate if the 
limiter Y = Y(R) has the properties that there exist con- 
stants B € (0, 00), œ € [—2, 0] such that YR e R 


a< Y(R) <, -B<——<2+a (70) 


Y(R) 
R 


with the constraint Y (1) = 1 and the smoothness condition 
Y(R) € C? near R = 1 together with a time step restriction 
for stability 


At a n max oh nmax 
1-a+p—— (|E T z0 
(7,1 Nau liy u lij 
where 
ag jm ag 
al, = sup ga Wi) 
AE fetus piss] 
Hele sn pans 
02 ~»\>9 
ame Jur “izp u) = 
ah jm ( oe 
— = sup — (tu ) 
u |, Ju ij+1/2 


felur jyt jy 

Ĝen}, etal 
ðh 

~ Jut 


(CAET D) =0 


Many limiter functions satisfy the technical conditions (70) 
of Theorem 26. Some examples include 


e the van Leer limiter 


R+ |R\ 


WER) = ——_— 
(R) 1+|R| 
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e the van Albada limiter 


R +R? 


VA eae 
eae 


In addition, Koren (1988) has constructed the limiter 


R +2R? 


Epes ee 
y (2) = ST Rp oR? 


which also satisfies the technical conditions (70) and 
corresponds for smooth solutions in 1-D to the most 
accurate k = 1/3 MUSCL scheme of van Leer. 


3.2.2 FV schemes on unstructured meshes utilizing 
linear reconstruction 


Higher-order finite volume extensions of Godunov dis- 
cretization to unstructured meshes are of the general form 


du; 1 a 
a D gapu eT 0D 
Í! vey, 8T; 


with the numerical flux 8j v) given by the quadrature 
rule 


Q 
EUo Ufa) = Y ogg OED UED RED OD 
g=l 


where œ, € R and x, € e; represent quadrature weights 
and locations, q = 1,..., Q. Given the global space of 
piecewise constant cell averages, u, € ve , the extrapolated 
states u(x) and ui (x) are evaluated using a pth order 
polynomial reconstruction operator, R°: Vp > Vi 


43, (%) = lini R - ev (x); up) 
ik= lim RO (x + Evje (x); up) 


for any x € ejg- In addition, it is assumed that the recon- 
struction satisfies the property il Sr, R(x; u,) dx =u; 
stated previously in (64b). In the general finite volume for- 
mulation, the control volume shapes need not be convex; 
see, for example, Figure 4. Even so, the solution accuracy 
and maximum stable time step for explicit schemes may 
depend strongly on the shape of individual control volumes. 
In the special case of linear reconstruction, R? (x; up) the 
impact of control volume shape on stability of the scheme 
can be quantified more precisely. Specifically, the maxi- 
mum principle analysis presented later for the scheme (71) 
reveals an explicit dependence on the geometrical shape 
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parameter 


pem == sup a (0) (73) 
0<0<20 


where 0 < a(@) < 1 represents the smallest fractional per- 
pendicular distance from the gravity center to one of two 
minimally separated parallel hyperplanes with orientation 9 
and hyperplane location such that all quadrature points in 
the control volume lie between or on the hyperplanes as 
shown in Figure 5. Table 2 lists [8°°™ values for various 
control volume shapes in R!, R?, R°, and R¢. As might 
be expected, those geometries that have exact quadrature 
point symmetry with respect to the control volume gravity 
center have geometric shape parameters T'#°°™ equal to 2 
regardless of the number of space dimensions involved. 


Figure 4. Polygonal control volume cell 7; and perimeter quadra- 
ture points (solid circles). 


& Gravity center 


o Quadrature point 


#0) 


Figure 5. Minimally separated hyperplanes 4” (6) and h” (0) and 


the fractional distance ratio «(6) for use in the calculation of 
pem: 


Table 2. Reconstruction geometry factors for various control 
volume shapes utilizing midpoint quadrature rule. 


Control volume shape Space dimension pen 
Segment 1 2 
Triangle 2 3 
Parallelogram 2 2 

-1 
Regular n-gon 2 n si | z ] 
Tetrahedron 3 4 
parallelepiped 3 2 
Simplex d d+1 
Hyper-parallelepiped d 2 
Polytope d Equation (73) 


Lemma 2 (Finite volume interval bounds on unstruc- 
tured meshes, R°(x; u,)) The fully discrete finite volume 
scheme 

u”+! =u — At 


j j Bilt UR”), YeT (74) 
| Til verco 


with monotone Lipschitz continuous numerical flux function, 

nonnegative quadrature weights, and linear reconstructions 
i 0 

ev 4 (%); Uy), X ECR U,E Va 


uj (x) = lim RYE — 


ui (x) = Bonté HEVN Uy) XE Cy My E ve 


with extremal trace values at control volume: quadrature 
points 


um = min wg) Um = max Ugg), Xq € je 
VejkEðT) Ve, €0T; 
lsqsQ l<qs@Q 


exhibits the local interpolated interval bound 


UPR + (1 — out < ugt! < (1 ~o,)uj +007" 
(75) 
with the time step proportional interpolation parameter o; 


defined by 


sen 


in, uey 
veea FUP U 
regag FEU UT 


TT OVa Ht D 


(76) 
that depends on the shape parameter T®°°" defined in (73). 


Given the two-sided bound of Lemma 2, a discrete max- 
imum principle is obtained under a CFL-like time step 
restriction if the limits U™™ and U"" can be bounded from 
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el a 


above and below respectively by the neighboring cell aver- 
ages. This idea is given more precisely in the following 
theorem. 


Theorem 27 (Finite volume maximum principle on 
unstructured meshes, R?) Let umin and u™* denote the 
minimum and maximum value of solution cell averages for 
a given cell T, and corresponding adjacent cell neighbors, 
that is, 


u™ = min (u,,u,) and u” = max (u,,u 77 
j an W j gean Ð 7) 


The fully discrete finite volume scheme 


urt = uf = —— aS gU UR) VT; €T (78) 


Fl ver 07; 


with monotone Lipschitz continuous numerical flux function, 
nonnegative quadrature weights, and linear reconstructions 


- — 1h 0 i 
u(x) = e Ry (x — Evj(X); Up) X ECR, Uy € ve 


u(x) = lim Ry (X H EVE); Up) XE Chie up E VP 


J 
(79) 
exhibits the local space-time maximum principle for each 
TET 


ur, ug) Supt s u, u 
ose ji) S gay GF jr Me) 
as well as the local spatial maximum principle at steady 


state (ut! = u” = u*) 


min up <u} < max u% 


VejrEðTy 1 — YejreðTy 


if the linear reconstruction satisfies Ve E aT, and X € 
en = A earn 2} 


max(uP™" up") < ug q) < mine", ue") (80) 


under the time step restriction 


Bir 


Far VTD 0 


L= A pgeom y sup 


IT; ie it 
veeo Hel up] 


lsqs@ ñe; 


mion „manny 
E, 


with T8°°™ defined in (73). 


Note that a variant of this theorem also holds if the defini- 
tion of u™* and u™" are expanded to include more control 
volume neighbors. Two alternative definitions frequently 


used when the control volume shape is a simplex are given 


uy" = min upand up*= max u (81) 
eT eT 
TNT #B TAT AB 


These expanded definitions include adjacent cells whose 
intersection with T, in R? need only be a set of measure 
zero or greater. 


Slope limiters for linear reconstruction. 

Given a linear reconstruction Rz; u,) that does not 
necessarily satisfy the requirements of Theorem 27, it is 
straightforward to modify the reconstruction so that the 
new modified reconstruction does satisfy the requirements 
of Theorem 27. For each control volume T€ T, a modified 
reconstruction operator RO 16%; up) of the tain 


R(x; uy), =u + az, (RY (s; uy), —u;) 


is assumed for ar € [0, 1]. By construction, tbis modified 
reconstruction correctly reproduces the control volume cell 
average for all values of ar, that is, 


m0 
ml Ry (x; u,) dx = uy; (82) 


The most restrictive value of ay, for each control volume T, 
is then computed on the basis of the Theorem 27 constraint 
(80), that is, 


ay = 
mine, uP) —u; if RIG uly, 
RY; upr = Hy > minu?™, up”) 
min 4 max, up) —u; if RIC; uly 
aen | R y < marag, uP) 


1 otherwise (83) 


where u™* and u™" are defined in (77). When the resulting 
modified reconstruction operator is used in the extrapolation 
formulas (79), the discrete maximum principle of Theo- 
tem 27 is attained under a CFL-like time step restriction. 
By utilizing the inequalities 


max(uj, u) S min(up™™”, wy”) 
and 


min(u,, u) > max(ui"", unin) 
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it is straightforward to construct a simpler but more 
restrictive limiter function 


max(u;,u,)—U; if RVG: uy )Iz, 
R? C; un)lz, =u; > max(u,, uz) 
LM 


op =s an min(uj, u) =u; if R(x; u,)Iz, 
REPT TRY ule —u, < min(u;, uy) 
1sq<Q 1 Cgi Madly, = My Jk 

1 otherwise 


(84) 
that yields modified reconstructions satisfying the technical 
conditions of Theorem 27. This simplified limiter (84) 
introduces additional slope reduction when compared to 
(83). This can be detrimental to the overall accuracy of the 
discretization. The limiter strategy (84) and other variants 
for simplicial control volumes are discussed further in Liu 
(1993), Wierse (1994), and Batten, Lambert and Causon 
(1996). 

In Barth and Jespersen (1989), a variant of (83) was 
proposed 


max __ i£ pOg « 
uy Hi if Ry (z; ualr 
Dfa e max 
RI Cg: Yall, — uy >u; 
BI i min _ . 0 £ 
ay. = Saree u; u; if Ry (45 Uy )Iz, 
ejeg Da s = < ymin 
13450 Ry Gigs a)l, = y j 
1 otherwise 


(85) 
Although this limiter function does not produce modified 
reconstructions satisfying the requirements of Theorem 27, 
using Lemma 2, it can be shown that the Barth and Jes- 
persen limiter yields finite volume schemes (74) possessing 
a global extremum diminishing property, that is, that the 
solution maximum is nonincreasing and the solution mini- 
mum is nondecreasing between successive time levels. This 
limiter function produces the least amount of slope reduc- 
tion when compared to the limiter functions (83) and (84). 
Note that in practical implementation, all three limiters (83), 
(84), and (85) require some modification to prevent near 
zero division for nearly constant solution data. 


Figure 6. Triangle control volume Aj); (shaded) with three 
adjacent cell neighbors. 


3.2.3 Linear reconstruction operators on simplicial 
control volumes 


Linear reconstruction operators on simplicial control vol- 
umes that satisfy the cell averaging requirement (64b) often 
exploit the fact that the cell average is also a pointwise 
value of any valid linear reconstruction evaluated at the 
gravity center of a simplex. This reduces the reconstruc- 
tion problem to that of gradient estimation given pointwise 
samples at the gravity centers. In this case, it is convenient 
to express the reconstruction in the form 


RVG uly, = Uy + (Vita — 27) 86) 


where xp denotes the gravity center for the simplex T; 
and (Vur), is the gradient to be determined. Figure 6 
depicts a 2-D simplex A;z3 and three adjacent neighboring 
simplices. Also shown are the corresponding four pointwise 
solution values {A, B,C, O} located at gravity centers 
of each simplex. By selecting any three of the four 
pointwise solution values, a set of four possible gradients 
are uniquely determined, that is, {(V(ABC), V(ABO), 
V(BCO), V(CAO)}. Using the example of Figure 6, 
a number of slope limited reconstruction techniques are 
possible for use in the finite volume scheme (78) that meet 
the technical conditions of Theorem 27. 


1. Choose (Vu,)7,,, = V(ABC) and limit the resulting 
reconstruction using (83) or (84). This technique is 
pursued in Barth and Jespersen (1989) but using the 
limiter (85) instead. 

2. Limit the reconstructions corresponding to gradients 
V(ABC), V(ABO), V(BCO), and V(CAQ) using 
(83) or (84) and choose the limited reconstruction 
with largest gradient magnitude. This technique is a 
generalization of that described in Batten, Lambert and 
Causon (1996) wherein limiter (84) is used. 

3. Choose the unlimited reconstruction V(ABC), 
V(ABO), V(BCO), and V(CAQ) with largest gra- 
dient magnitude that satisfies the maximum principle 
reconstruction bound inequality (80). If all reconstruc- 
tions fail the bound inequality, the reconstruction gra- 
dient is set equal to zero; see Liu (1993). 


3.2.4 Linear reconstruction operators on general 
control volumes shapes 


In the case of linear reconstruction on general volume 
shapes, significant simplification is possible when compared 
with the general p-exact reconstruction formulation given 
in Section 3.2.5. It is again convenient to express the 
reconstruction in the form 


RUG ualr =u; + Vay E=) (87) 


Figure 7. Triangulation of gravity center locations showing a 
typical control volume To associated with the triangulation vertex 
vo with cyclically mdexed graph neighbors Tk, k = 1,..., No. 


where x? denotes the gravity center for the control volume 
T; and (Via)y, is the gradient to be determined. Two com- 
mon techniques for simplified linear reconstruction include 
a simplified least squares technique and a Green—Gauss 
integration technique. 


Simplified least squares linear reconstruction. 

As was exploited in the linear reconstruction techniques 
for simplicial control volumes, linear reconstructions satis- 
fying (64b) on general control volume shapes are greatly 
simplified by exploiting the fact that the cell average value 
is also a pointwise value of all valid linear reconstructions 
evaluated at the gravity center of a general control volume 
shape. This again reduces the linear reconstruction problem 
to that of gradient estimation given pointwise values. In the 
simplified least squares reconstruction technique, a triangu- 
lation (2-D) or tetrahedralization (3-D) of gravity centers 
is first constructed as shown in Figure 7. Referring to this 
figure, for each edge of the simplex mesh incident to the 
vertex Up, an edge projected gradient constraint equation is 
constructed subject to a prespecified nonzero scaling w, 


Wy (Vity) aq (Xk — XO) = Wy (Uy ~ Uo) 


The number of edges incident to a simplex mesh vertex 
in R? is greater than or equal to d thereby producing 
the following generally nonsqnare matrix of constraint 
equations 


wAxp wy Ay? w, (uy — uo) 


(Vu), = 


EO Nye cal 
Wy, AXN Wy, ÅYM Wy, lEn uo) 
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or in abstract form 
[z, L] Vu = f 


This abstract form can be symbolically solved in a least 
squares sense using an orthogonalization technique yielding 
the closed form solution 


eee ee ee p- lall . D) (88) 
hic hilz — lh (ZE fdlh: f) 


ij =L,-L,. The form of this solution in terms 
of scalar dot products over incident edges suggests that 
the least squares linear reconstruction can be efficiently 
computed via an edge data structure without the need for 
storing a nonsquare matrix. 


Green—Gauss linear reconstruction. 

This reconstruction technique specific to simplicial meshes 
assumes nodal solution values at vertices of the mesh which 
uniquely describes a C? linear interpolant, u,. Gradients are 
then computed from the mean value approximation 


CACAN [ Vu, dx = f u,dv (89) 
Qo BQ 


For linear interpolants, the right-hand side term can be writ- 
ten in the following equivalent form using the configuration 
depicted in Figure 8 


No 3 
f Vu, dx = D zo + Uy) Yok 
Mo k=l 


where vo, represents any path integrated normal connecting 
pairwise adjacent simplex gravity centers, that is, 


Ss 
Yk+1/2 
Vok = f . dv (90) 


Figure 8. Median dual control volume To demarcated by median 
segments of triangles incident to the vertex vo with cyclically 
indexed adjacent vertices vp, k = 1,...No- 
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A particularly convenient path is one that traces out portions 
of median segments as shown in Figure 8. These segments 
demarcate the so-called median dual control volume. By 
construction, the median dual volume jọ] is equal to 
[Qol/(d + 1) in R?. Consequently, a linear reconstruction 
operator on nonoverlapping median dual control volumes 
is given by 


No 
1 
ITV un = Do 5 Ho +o (91) 
k=l 


The gradient calculation is exact whenever the numerical 
solution varies linearly over the support of the reconstruc- 
tion. Since mesh vertices are not located at the gravity 
centers of median dual control volumes, the cell averag- 
ing property (64b) and the bounds of Theorem 27 are only 
approximately satisfied using the Green—Gauss technique. 

A number of slope limited linear reconstruction strategies 
for general control volume shapes are possible for use in 
the finite volume scheme (78) that satisfy the technical 
conditions of Theorem 27. Using the example depicted in 
Figure 7, let Vy, 12!¢n denote the unique linear gradient 
calculated from the cell average set {to Hp, u,.,}. Three 
slope limiting strategies that are direct counterparts of the 
simplex control volume case are 


1. Compute (Vu,)z, using the least squares linear recon- 
struction or any other valid linear reconstruction tech- 
nique and limit the resulting reconstruction using (83) 
or (84). 

2. Limit the reconstructions corresponding to the gradi- 
ents Visitas K= 1,..+)No and (Vu,)z, using (83) 
or (84) and choose the limited reconstruction with 
largest gradient magnitude. 

3. Choose the unlimited reconstruction from V,.1;.¢p; 
k=1,..., Ng and (Vu,)7, with largest gradient mag- 
nitude that satisfies the maximum principle reconstruc- 
tion bound inequality (80). If all reconstructions fail 
the bound inequality, the reconstruction gradient is set 
equal to zero. 


3.2.5 General p-exact reconstruction operators on 
unstructured meshes 


Abstractly, the reconstruction operator serves as a finite- 
dimensional pseudo inverse of the cell averaging operator 
A whose jth component A; computes the cell average of 
the solution in T; 


1 
Aw = — f udx 
a mids 


The development of a general polynomial reconstruction 
operator, RY; that reconstructs p-degree polynomials from 
cell averages on unstructured meshes follows from the 
application of a small number of simple properties. 


1. (Conservation of the mean) Given solution cell aver- 
ages up, the reconstruction Roun is required to have 
the correct cell average, that is, 


if v= ROu, then u, = Av 


More concisely, 
O x 
AR, =i 


so that R? is a right inverse of the averaging opera- 
tor A. 

2. (p-exactness) A reconstruction operator R? is p-exact 
if RA reconstructs polynomials of degree p or less 
exactly. Denoting by P, the space of all polynomials 
of degree p, 


if u € P, and v = Au then Rv =u 
This can be written succinctly as 
0 os 
RoAlp, = 1 


so that R? is a left inverse of the averaging operator 
A restricted to the space of polynomials of degree at 
most p. : 

3. (Compact support) The reconstruction in a control 
volume T, should only depend on cell averages in a 
relatively small neighborhood surrounding T,. Recall 
that a polynomial of degree p in R? contains (4%) 
degrees of freedom. The support set for T, is required 
to contain at least this number of neighbors. As the 
support set becomes even larger for fixed p, not only 
does the computational cost increase, but eventually, 
the accuracy decreases as less valid data from further 
away is brought into the calculation. 


Practical implementations of polynomial reconstruction 
operators fall into two classes: 


e Fixed support stencil reconstructions. These methods 
choose a fixed support set as a preprocessing step. 
Various limiting strategies are then employed to obtain 
nonoscillatory approximation; see, for example, Barth 
and Frederickson (1990) and Delanaye (1996) for 
further details. 

e Adaptive support stencil reconstructions. These ENO- 
like methods dynamically choose reconstruction sten- 
cils based on solution smoothness criteria; see, for 


example, Harten and Chakravarthy (1991), Vankeirs- 
blick (1993), Abgrall (1994), Sonar (1997), and Sonar 
(1998) for further details. 


3.2.6 Positive coefficient schemes on unstructured 
meshes 


Several related positive coefficient schemes have been 
proposed on multidimensional simplicial meshes based 
on one-dimensional interpolation. The simplest example 
is the upwind triangle scheme as introduced by Billey 
et al. (1987), Desideri and Dervieux (1988) and Rostand 
and Stoufflet (1988) with later improved variants given 
by Jameson (1993) and Cournéde, Debiez and Dervieux 
(1998). These schemes are not Godunov methods in the 
sense that a single multidimensional gradient is not obtained 
in each control volume. The basis for these methods origi- 
nates from the gradient estimation formula (91) generalized 
to the calculation of flux divergence on a median dual 
tessellation. In deriving this flux divergence formula, the 
assumption has been made that flux components vary lin- 
early within a simplex yielding the discretization formula 


[iDa fr dv 


= y 5 (Fa) + fu) Vik 


Vejx@T; 


where v,, is computed from a median dual tessellation 
using (90). This discretization is the unstructured mesh 
counterpart of central differencing on a structured mesh. 
Schemes using this discretization of flux divergence lack 
sufficient stability properties for computing solutions of 
general nonlinear conservation laws. This lack of stability 
can be overcome by adding suitable diffusion terms. One of 
the simplest modifications is motivated by upwind domain 
of dependence arguments yielding the numerical flux 


Byes ey) =} (fp + fup) -vje — Slade (92) 


with a,, a mean value (a.k.a. Murman-Cole) linearization 
satisfying 


Vig’ Auf = jp Ajylt 


Away from sonic points where f’(u*)=0 for u* € 
[uj u; 411, this numerical flux is formally an E-flux sat- 
isfying (28). With suitable modifications of ają near sonic 
points, it is then possible to produce a modified numerical 
flux that is an E-flux for all data; see Osher (1984). The- 
orems 5, 6, and 7 show that schemes such as (22) using 
E-fluxes exhibit local discrete maximum principles and L,, 
stability. 
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Unfortunately, schemes based on (92) are too dissipative 
for most practical calculations. The main idea in the 
upwind triangle scheme is to add antidiffusion terms to the 
numerical flux function (92) such that the sum total of added 
diffusion and antidiffusion terms in the numerical flux 
function vanish entirely whenever the numerical solution 
varies linearly over the support of the flux function. In all 
remaining situations, the precise amount of antidiffusion is 
determined from maximum principle analysis. 


Theorem 28 (Maximum principles for the upwind tri- 
angle scheme) Let T denote the median dual tessellation of 
an underlying simplicial mesh. Also, let u; denote the nodal 
solution value at a simplex vertex in one-to-one dual cor- 
respondence with the control volume T, € T such that a Cc 
linear solution interpolant is uniquely specified on the sim- 
plicial mesh. Let jg ljo Uj, Us uy) denote the numerical 
flux function with limiter function Y():R œ> R 


1 
EjU jo Uj, Ug Up) = zee + f (u,)) Vig 


utilizing the mean value speed a,, satisfying 
Vj Aja f = aj Aju 


and variable spacing parameter hjg = [Ajax]. The fully 
discrete finite volume scheme 


n+1 yn At a ae A, 
u; =u} im D Eje Uj Uj: Up Ug), YT, eT 
J! vejee6T; 


Figure 9. Triangle complex used in the upwind triangle schemes 
showing the linear extension of ejg into neighboring triangle for 
the determination of points x; and xp. 
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with linearly interpolated values uj, and u as depicted in 
Figure 9 exhibits the local space-time maximum principle 


; noon n+l eam 
min (uj, Uk) Sus" < max (uF, uz) 
va, | jr Me) Sj nag, ( ER 


and the local spatial maximum principle at steady state 


(wt) = u” = u*) 


min ug <uf < max u% 
Ve, €0T; T T Yerel; 


if the limiter Y(R) satisfies YR € R 


o s EO, 0< Y(R) <2 


Some standard limiter functions that satisfy the require- 
ments of Theorem 28 include 


e the MinMod limiter with maximum compression 
parameter equal to 2 


WMM(R) = max(0, min(R, 2)) 


e the van Leer limiter 


R+ [RI 


wR) = 
R 1+1R} 


Other limiter formulations involving three successive one- 
dimensional slopes are given in Jameson (1993) and 
Cournéde, Debiez and Dervieux (1998). 


4 FURTHER ADVANCED TOPICS 


The remainder of this chapter will consider several exten- 
sions of the finite volume method. Section 4.1 considers a 
class of higher-order accurate discretizations in time that 
still preserve the stability properties of the fully discrete 
schemes using Euler time integration. This is followed by 
a discussion of generalizations of the finite volume method 
for problems including second-order diffusion terms and 
the extension to systems of nonlinear conservation laws. 


4.1 Higher-order time integration schemes 


The derivation of finite volume schemes in Section 2 began 
with a semidiscrete formulation (21) that was later extended 
to a fully discrete formulation (22) by the introduction of 
first-order accurate forward Euler time integration. These 
latter schemes were then subsequently extended to higher- 
order accuracy in space using a variety of techniques. For 


many computing problems of interest, first-order accuracy 
in time is then no longer enough. To overcome this 
low-order accuracy in time, a general class of higher- 
order accurate time integration methods was developed that 
preserve stability properties of the fully discrete scheme 
with forward Euler time integration. Following Gottlieb, 
Shu and Tadmor (2001) and Shu (2002), these methods 
will be referred to as strong stability preserving (SSP) time 
integration methods. 

Explicit SSP Runge-Kutta methods were originally 
developed by Shu (1988), Shu and Osher (1988) and 
Gottlieb and Shu (1998) and called TVD Runge-Kutta 
time discretizations. In a slightly more general approach, 
total variation bounded (TVB) Runge-Kutta methods were 
considered by Cockburn and Shu (1989), Cockburn, Lin and 
Shu (1989), Cockburn, Hou and Shu (1990) and Cockburn 
and Shu (1998) in combination with the discontinuous 
Galerkin discretization in space. Ktither (2000) later gave 
error estimates for second-order TVD Runge-Kutta finite 
volume approximations of hyperbolic conservation laws. 

To present the general framework of SSP Runge-Kutta 
methods, consider writing the semidiscrete finite volume 
method in the following form 


d 
GJ) LU (93) 


where U = U(t) denotes the solution vector of the semidis- 
crete finite volume method. Using this notation together 
with forward Euler time integration yields the fully discrete 
form 


un! = U” — At L(U") (94) 


where U” is now an approximation of U(t”). As demon- 


strated in Section 2.2, the forward Euler time discretization 
is stable with respect to the L®-norm, that is, 


HO" Noo < IU” Meo (95) 

subject to a CFL-like time step restriction 
At < Afy (96) 
With this assumption, a time integration method is said to 
be SSP (see Gottlieb, Shu and Tadmor, 2001) if it preserves 
the stability property (95), albeit with perhaps a slightly 

different restriction on the time step 

At <c Aty (97) 
where c is called the CFL coefficient of the SSP method. In 


this framework, a general objective is to find SSP methods 
that are higher-order accurate, have low computational cost 


F| 


and storage requirements, and have preferably a large CFL 
coefficient. Note that the TVB Runge-Kutta methods can 
be embedded into this class if the following relaxed notion 
of stability is assumed 


JO" o £ OAD) "lle (98) 


4.1.1 Explicit SSP Runge~Kutta methods 


Following Shu and Osher (1988) and the review articles by 
Gottlieb, Shu and Tadmor (2001) and Shu (2002), a general 
m stage Runge-Kutta method for integrating (93) in time 
can be algorithmically represented as 


ue? ‘= Ut 
l-i ç P 
ui = > (a,07* + By AtL(O*)), 
=0 
dy, >20, L=1,...,m (99) 


y"+ = Ü” 


To ensure consistency, the additional constraint iZ} Oy = 
1 is imposed. If, in addition, all B,, are assumed to be non- 
negative, it is straightforward to see that the method can be 
written as a convex (positive weighted) combination of sim- 
ple forward Euler steps with Ar replaced by (B/Q) At. 
From this property, Shu and Osher (1988) concluded the 
following lemma: 


Lemma 3. Ifthe forward Euler method (94) is L°-stable 
subject to the CFL condition (96), then the Rung—Kutta 
method (99) with By = 0 is SSP, that is, the method is 
L®-stable under the time step restriction (97) with CFL 
coefficient 


c = min ~= (100) 


In the case of negative $y, 2 similar result can be proven; 
see (Shu and Osher, 1988). 


4.1.2 ‘Optimal second- and third-order nonlinear 
SSP Runge-Kutta methods 


Gottlieb, Shu and Tadmor (2001) (Proposition 3.1) show 
that the maximal CFL coefficient for any m-stage, mth order 
accurate SSP Runge-Kutta methods is c = 1. Therefore, 
SSP Runge—Kutta methods that achieve c = 1 are termed 
‘optimal’, Note that this restriction is not true if the number 
of stages is higher than the order of accuracy; see Shu 
(1988). 

Optimal second- and third-order nonlinear SSP Runge- 
Kutta methods are given in Shu and Osher (1988). The 
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optimal second-order, two-stage nonlinear SSP Runge- 
Kutta method is given by 


O° := y" 
Ü! := Ü? + At LO) 
Unt! = 10° 4 10 +4 bAt LO") 


This method corresponds to the well-known method of 
Heun. Similarly, the optimal third-order, three-stage non- 
linear SSP Runge-Kutta method is given by 


U 
0? = 30 + 10" + dar LOD 
160 4.207 + ZArL ©?) 


Further methods addressing even higher-order accuracy or 
lower storage requirements are given in the review articles 
of Gottlieb, Shu and Tadmor (2001) and Shn (2002) where 
SSP multistep methods are also discussed. 


4.2 Discretization of elliptic problems 


Finite volume methods for elliptic boundary value prob- 
lems have been proposed and analyzed under a variety 
names: box methods, covolume methods, diamond cell 
methods, integral finite difference methods, and finite vol- 
ume element (FVE) methods; see Bank and Rose (1987), 
Cai (1991), Süli (1991), Lazarov, Michev and Vassilevsky 
(1996), Viozat et al. (1998), Chatzipantelidis (1999), Chou 
and Li (2000), Hermeline (2000), Eymard, Galluoét and 
Herbin (2000), and Ewing, Lin and Lin (2002). These meth- 
ods address the discretization of the following standard 
elliptic problem in a convex polygonal domain & C R? 


-V-AVu= fin Q (101) 
u(x) =Oon dQ 


for A e R?*?, a symmetric positive definite matrix (ass- 
umed constant). Provided f € H?(Q), then a solution u 
exists such that ue HP, -1 <B <1, B Æ +1/2, 
where H*(Q) denotes the Sobolev space of order s in Q. 
Nearly afl the above mentioned methods can be recast in 
Petrov—Galerkin form using a piecewise constant test space 
together with a conforming trial space. A notable exception 
is given in Chatzipantelidis (1999) wherein nonconforming 
Crouzeix—Raviart elements are utilized and analyzed. To 
formulate and analyze the Petrov-Galerkin representation, 
two tessellations of S2 are considered: a triangulation T 
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(b) Median dual tessellation 


(a) Voronoi dua! tessellation 


Figure 10. Two control volume variants used in the finite volume 
discretization of second-order derivative terms (a) Voronoi dual 
where edges of the Voronoi dual are perpendicular to edges of 
the triangulation and (b) median dual formed from median dual 
segments in each triangle. 


with simplicial elements K € T and a dual tessellation 7* 
with control volumes T e 7”. In the class of conforming 
trial space methods such as the FVE method, a globally 
continuous, piecewise pth order polynomial trial space 
with zero trace value on the physical domain boundary is 
constructed 


X, = {v € CQ)|vlg € P,(K), VK €T and vilao = 0} 


using nodal Lagrange elements on the simplicial mesh. 
A dual tessellation 7* of the Lagrange element is then 
constructed: see, for example, Figure 10, which shows 
a linear Lagrange element with two dual tessellation 
possibilities. These dual tessellated regions form control 
volumes for the finite volume method. The tessellation 
technique extends to higher-order Lagrange elements ina 
straightforward way. A piecewise constant test space is then 
constructed using T* 


Y, = {v| vir € X(T), YT € T% 


where x(T) is a characteristic function in the control 
volume T. The finite volume element discretization of (101) 
then yields the following Petrov-Galerkin formulation: 
Find u, € X, such that 


D (f w,AVu,: do+ fw, fax) =0, Vw, € Yn 
ar T 


vTeT™* 

(102) 
The analysis of (102) by Ewing, Lin and Lin (2002) using 
linear elements gives an a priori estimate in an L? norm 
that requires the least amount of solution regularity when 
compared to previous methods of analysis. 


Theorem 29 (FVE a priori error estimate, Ewing, Lin 
and Lin, 2002) Assume a 2-D quasi-uniform triangulation 


Twith dual tessellation T* such that 3C > 0 satisfying 
cor <|T| <ch, vWreT 


Assume that u and u, are solutions of (101) and (102) 
respectively with u € H*(Q), f € H®, (0 <B < 1). Then 
3C’ > 0 such that the a priori estimate holds 


ju — ulir < C Klula +A) (103) 


Unlike the finite element method, the error estimate (103) 
reveals that optimal order convergence is obtained only 
if f e H? with B > 1. Moreover, numerical results show 
that the source term regularity cannot be reduced with- 
out deteriorating the measured convergence rate. Optimal 
convergence rates are also shown for the nonconform- 
ing Crouzeix—Raviart element based finite volume method 
analyzed by Chatzipantelidis (1999) for u € H?(Q) and 
f € HQ). i 

An extensive presentation and analysis of finite vol- 
ume methods for elliptic equations without utilizing a 
Petrov-Galerkin formulation is given in Eymard, Galluoét 
and Herbin (2000). In this work, general boundary con- 
ditions that include nonhomogeneous Dirichlet, Neumann, 
Robin conditions are discussed. In addition, the analysis is 
extended to general elliptic problems in divergence form, 
including convection, reaction, and singular source terms. 


4.3 Conservation laws including diffusion terms 


As demonstrated in Section 1, hyperbolic conservation 
jaws are often approximations to physical problems with 
small or nearly vanishing viscosity. In other problems, the 
quantitative solution effects of these small viscosity terms 
are actually sought. Consequently, it is necessary in these 
problems, to include viscosity terms into the conservation 
law formulation. As a model for these latter problems, a 
second-order Laplacian term with small diffusion parameter 
is added to the first-order Cauchy problem, that is, 


au+V-f@)—eAu=0 in RxR? (104a) 
u(x,0) =u, in R (104b) 


Here, u(x, t):R? x Rt > R denotes the dependent solu- 


tion variable, f € C(R) the hyperbolic flux, and s> 0a.: 


small diffusion coefficient. Application of the divergence 
and Gauss theorems to (104a) integrated in a region T 
yields the following integral conservation law form 


a [vas | fw- av f Vu- dv=0 (105) 
ðt Jr ar ore 


A first goal is to extend the fully discrete form (22) of 
Section 2 to the integral conservation law (105) by the 
introduction of a numerical diffusion flux function d,,(u,) 
for a control volume T, € T such that 


f eVu-dv® D> dlu) 
aT; 


ejn€OT; 


When combined with the general finite volume formulation 
(22) for hyperbolic conservation laws, the following fully 
discrete scheme is produced 


At” 
at = uy = TTI (gaur ug) — dj, (up))s VT, € T 
J! e@87; 
(106) 
In this equation, the index m may be chosen either 
as n or n+ 1, corresponding to an explicit or implicit 
discretization. 


4.3.1 Choices of the numerical diffusion flux dy 


The particular choice of the numerical diffusion flux 
function d, depends on the type of control volume that is 
used. Since the approximate solution u, is assumed to be a 
piecewise constant function, the definition of dig involves 
a gradient reconstruction of u, in the normal direction to 
each cell interface e,,. The reconstruction using piecewise 
constant gradients is relatively straightforward if the control 
volumes are vertex-centered, or if the cell interfaces are 
perpendicular to the straight lines connecting the storage 
locations (see Figure 10). 


Vertex-centered finite volume schemes. 

In the case of vertex-centered control volumes such as the 
median dual control volume, a globally continuous, piece- 
wise linear approximate solution #, is first reconstructed 
on the primal mesh. Vii, is then continuous on the control 
volume interfaces and the numerical diffusion flux straight- 
forwardly computed as 


dpt) = [ Vill + dvg aon 
Ejk 


Cell-centered finite volume schemes. 

In the case of cell-centered finite volume schemes, where 
an underlying primal-dual mesh relationship may not exist, 
a simple numerical diffusion flux can be constructed when- 
ever cell interfaces are exactly or approximately perpendic- 
ular to the straight lines connecting the storage locations, 
for example, Voronoi meshes, quadrilateral meshes, and so 
on. In these cases, the reconstructed gradient of u, pro- 
Jected normal to the cell interface e,, can be represented 
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by 
m m 
uy, — uu; 
Vu V= k i 
| pX; 


where x; denotes the storage location of cell T,. The 
numerical diffusion flux for this case is then given by 


le] 


[xg — x; 


diluk) = (ug — uP) (108) 


Further possible constructions and generalizations are given 
in Eymard, Gallouët and Herbin (2001), Gallouët, Herbin 
and Vignal (2000), and Herbin and Ohlberger (2002). 


4.3.2 Note on stability, convergence, and error 
estimates 


Stability analysis reveals a CFL-like stability condition for 
the explicit scheme choice (m = n) in (106) 


aaa TON 
~ aLphmin +E 
where L, denotes the Lipschitz constant of the hyperbolic 
numerical flux, œ is a positive mesh dependent parameter, 
and € is the diffusion coefficient. In constructing this bound, 
acertain form of shape regularity is assumed such that there 
exists an & > 0 such that for all j,k with h, = diam(T;,) 
ak? < |T ad%,|<hy, hy < |x — x] (109) 
Thus, Az” is of the order h? for large e and of the order 
h for e < h. In cases where the diffusion coefficient is 
larger than the mesh size, it is advisable to use an implicit 
scheme (m =n + 1). In this latter situation, no time step 
restriction has to be imposed; see Eymard et al. (2002) and 
Ohlberger (2001b). 

In order to demonstrate the main difficulties when analyz- 
ing convection-dominated problems, consider the following 
result from Feistauer et al. (1999) for a homogeneous dif- 
fusive boundary value problem. In this work, a mixed finite 
volume, finite element method sharing similarities with the 
methods described above is used to obtain a numerical 
approximation u,, of the exact solution u. Using typical 
energy-based techniques, they prove the following a priori 
error bound. 


Theorem 30. For initial data uy € L” (R?) N W™? (R?) 
and t > 0 there exist constants cy, ¢, > 0 independent of 
e such that 


jut, t) — upt lro < ehe (110) 
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This estimate is fundamentally different from estimates 
for the purely hyperbolic problems of Sections 2 and 3. 
Specifically, this result shows how the estimate strongly 
depends on the small parameter £; ultimately becoming 
unbounded as ¢ tends to zero. 

In the context of convection dominated or degener- 
ate parabolic equations, Kruzkov-techniques have been 
recently used by Carrillo (1999) and Karlsen and Rise- 
bro (2000) in proving uniqueness and stability of solutions. 
Utilizing these techniques, convergence of finite volume 
schemes (uniform with respect to e —> 0) was proven in 
Eymard etal. (2002) and a priori error estimates were 
obtained for viscous approximations in Evje and Karlsen 
(2002) and Eymard, Gallouét and Herbin (2002). Finally, 
in Ohlberger (2001a, 2001b) uniform a posteriori error 
estimates suitable for adaptive meshing are given. Using 
the theory of nonlinear semigroups, continuous dependence 
results were also obtained in Cockburn and Gripenberg 
(1999) (see also Cockburn (2003) for a review). 


44 Extension to systems of nonlinear 
conservation laws 


A positive attribute of finite volume methods is the relative 
ease with which the numerical discretization schemes 
of Sections 2 and 3 can be algorithmically extended to 
systems of nonlinear conservation laws of the form 


3u +V. f(u)=0in R? x Rt (111a) 
u(x, 0) = u(x) in RÊ (11b) 


where u(x, t): R? x R+ —> R” denotes the vector of depen- 
dent solution variables, f(u): R" > R™*? denotes the flux 
vector, and u(x): R? -> R” denotes the initial data vec- 
tor at time ¢ = 0. It is assumed that this system is strictly 
hyperbolic, that is, the eigenvalues of the flux jacobian 
A(v; u) = ðf/ðu - v are real and distinct for all bounded 
ve Ri. 

The main task in extending finite volume methods to 
systems of nonlinear conservation laws is the construction 
of a suitable numerical flux function. To gain insight 
into this task, consider the one-dimensional linear Cauchy 
problem for u(x, t): R x Rt > R” and up(x): Rt R” 


3u +3, (Au)=0 in Rx Rt 
u(x, 0) =uo(x) in R (112) 
where A € R" is a constant matrix. Assume the matrix 
A has m real and distinct eigenvalues, 2.) <j < < Am 


with corresponding right and left eigenvectors denoted 
by r, e R” and l, e R” respectively for k = 1,...,m. 


Furthermore, let X e R”*” denote the matrix of right 
eigenvectors, X = [r,,...,7,,], and A € R”*™ the diag- 
onal matrix of eigenvalues, A = diag(d,,...,4,,) so that 
A = XAX71, The one-dimensional system (112) is readily 
decoupled into scalar equations via the transformation into 
characteristic variables a = X—!u for a € R” 


3a +3 (Aa) =0 in Rx Rt 
a(x, 0) = a(x) in R (113) 


and component-wise solved exactly 
aP, t) =o t) k=l, 


or recombined in terms of the original variables 


m 
u(x,t) = EL + Ug (x — ryt), 
k=1 


Using this solution, it is straightforward to solve exactly 
the associated Riemann problem for w(&, t) € R” 


3w +@,(Aw)=0 in Rx Rt 
with initial data 


_ju if§<0 
w&, 0 = a if$>0 
thereby producing the following Godunov-like numerical 
flux function 


gu, v) = Aw(0, t > 0) 
= {(Au + Av) — iA -u) (114) 


with |A| = X|A|X~!, When used in one-dimensional dis- 
cretization together with piecewise constant solution rep- 
resentation, the linear numerical flux (114) produces the 
well-known Courant—Isaacson—Rees (CIR) upwind scheme 
for linear systems of hyperbolic equations 


At 
ujt =u} — E (a+ (j = uj) +A (i u*)) 


where A* = XA*X~!. Note that higher-order accurate 
finite volume methods with slope limiting procedures 
formally extend to this linear system via component-wise 
slope limiting of the characteristic components a, k = 
1,...,m for use in the numerical flux (114). 


4.4.1 Numerical flux functions for systems of 
conservation laws 


In Godunov’s original work (see Godunov, 1959), exact 
solutions of the one-dimensional nonlinear Riemann prob- 
lem of gas dynamics were used in the constriction of a 
similar numerical flux function 


gS(u,v) = f(w(0,t,))-v (115) 


where w(E,t) € R” is now a solution of a nonlinear 
Riemann problem 


aw +O f(w) =0 in Rx Rt 
with initial data 


u if& <0 
w=" if&>0 


Recall that solutions of the Riemann problem for gas 
dynamic systems are a composition of shock, contact, and 
rarefaction wave family solutions. For the gas dynamic 
equations considered by Godunov, a unique solution of 
the Riemann problem exists for general states u and v 
except those states producing a vacuum. Even so, the 
solution of the Riemann problem is both mathematically 
and computationally nontrivial. Consequently, a number of 
alternative numerical fluxes have been proposed that are 
more computationally efficient. These alternative numerical 
fluxes can be sometimes interpreted as approximate Rie- 
mann solvers. A partial list of alternative numerical fluxes 
is given here. A more detailed treatment of this subject is 
given in Godlewski and Raviart (1991), Kröner (1997), and 
LeVeque (2002). 


e  Osher—Solomon flux (Osher and Solomon, 1982). This 
numerical flux is a system generalization of the 
Enquist—Osher flux of Section 2. All wave families 
are approximated in state space as rarefaction or 
inverted rarefaction waves with Lipschitz continuous 
partial derivatives. The Osher-~Solomon numerical 
flux is of the form 


eu, =+): ves i 1A(v; w) | dw 


where |A] denotes the usual matrix absolute value. By 
integrating on m rarefaction wave integral subpaths 
that are each parallel to a right eigenvector, a system 
decoupling occurs on each subpath integration. Fur- 
thermore, for the gas dynamic equations with ideal 
gas law, it is straightforward to construct m — 1 Rie- 
mann invariants on each subpath thereby eliminating 
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the need for path integration altogether. This reduces 
the numerical flux calculation to purely algebraic com- 
putations with special care taken at sonic points; see 
Osher and Solomon (1982). 


Roe flux (Roe, 1981). Roe’s numerical flux can be 
a interpreted as approximating all wave families as 
discontinuities. The numerical flux is of the form 


geu, v) = ifu) + fF) v 
— }|A(v; u, v)|(v — u) 


where A(v; u,v) is the ‘Roe matrix’ satisfying the 
matrix mean value identity 


(F@) — f@)-v = AQ; u, v)(v = u) 


with A(v; u, u) = A(v; u). For the equations of gas 
dynamics with ideal gas law, the Roe matrix takes a 
particularly simple form. Steady discrete mesh-aligned 
shock profiles are resolved with one intermediate 
point. The Roe flux does not preclude the formation of 
entropy violating expansion shocks unless additional 
steps are taken near sonic points. 


Steger — Warming flux vector splitting (Steger and 
Warming, 1981). Steger and Warming considered a 
splitting of the flux vector for the gas dynamic 
equations with ideal gas law that exploited the fact 
that the flux vector is homogeneous of degree one 
in the conserved variables. From this homogeneity 
property, Euler’s identity then yields that f(u) v = 
A(v; uju. Steger and Warming then considered the 
matrix splitting 


A=At4Aq7, Ats XA*X™! 


where A* is computed component-wise. From this 
matrix splitting, the final upwind numerical flux func- 
tion was constructed as 


gS” (u, v) = At (v; u) u + AW (v; v)u 


Although not part of their explicit construction, for 
the gas dynamic equations with ideal gas law, the 
jacobian matrix 8g5¥ /au has eigenvalues that are all 
nonnegative and the jacobian matrix 0g°¥/du has 
eigenvalnes that are all nonpositive whenever the ratio 
of specific heats y lies in the interval [1, 5/3]. The 
matrix splitting leads to numerical fluxes that do not 
vary smoothly near sonic and stagnation points. Use 
of the Steger — Warming flux splitting in the schemes 
of Sections 2 and 3 results in rather poor resolution 
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of linearly degenerate contact waves and velocity slip 
surfaces due to the introduction of excessive artificial 
diffusion for these wave families. 


e Van Leer flux vector splitting. Van Leer (1982) pro- 
vided an alternative flux splitting for the gas dynamic 
equations that produces a numerical flux of the 
form 


gu, v) = fu) + f7@) 


using special Mach number polynomials to con- 
struct fluxes that rernain smooth near sonic and 
stagnation points. As part of the splitting construc- 
tion, the jacobian matrix dg5/au has eigenvalues 
that are all nonnegative and the matrix ðg5Y/ðv 
has eigenvalues that are all nonpositive. The result- 
ing expressions for the flux splitting are some- 
what simpler when compared to the Steger - Warm- 
ing splitting. The van Leer splitting also intro- 
duces excessive diffusion in the resolution of lin- 
early degenerate contact waves and velocity slip 
surfaces. 


o System Lax~Friedrichs flux. This numerical flux is 
the system equation counterpart of the scalar local 
Lax—Friedrichs flux (27). For systems of conservation 
laws, the Lax—Friedrichs flux is given by 


gu, v) = kf) + Ff) -v— Law)(v — u) 


where «(v) is given through the eigenvalues (v; w) 
of A(v; w) 


a(v) = max sup |),(v; w) 
Isksm wefu,v] 


The system Lax—Friedrichs flux is usually not applied 
on the boundary of domains since it generally 
requires an overspecification of boundary data. The 
system Lax-Friedrichs flux introduces a relatively 
large amount of artificial diffusion when used in the 
schemes of Section 2. Consequently, this numerical 
flux is typically only used together with relatively 
high-order reconstruction schemes where the detri- 
mental effects of excessive artificial diffusion are 
mitigated. 


e Harten—Lax~van Leer flux (Harten, Lax and van 
Leer, 1983). The Harten~Lax—van Leer numerical 
flux originates from a simplified two wave model 
of more general m wave systems such that waves 
associated with the smallest and largest characteristic 
speeds of the m wave system are always accurately 
represented in the two-wave model. The following 


numerical flux results from this simplified two-wave 
model 


u,v) = 5FW + FO) 


— Edem tein (f(u) — fu) -v 


q Snax min (y — u) 


Omax T %min 


where 


Omax(Y) = max (0, Ae rO; w)) 


Onin = ma 0, 3 at Ay(v; w)) 
Using this flux, full upwinding is obtained for super- 
sonic flow. Modifications of this flux are suggested 
in Einfeldt et al. (1998) to improve the resolution of 
intermediate waves as well. 


Further examples of numerical fluxes (among others) 
include the kinetic flux vector splitting due to Deshpande 
(1986), the advection upstream spitting method (AUSM) 
flux of Liou and Steffen (1993), and the convective upwind 
and split pressure (CUSP) flux of Jameson (1993) and Tat- 
sumi, Martinelli and Jameson (1994). 


5 CONCLUDING REMARKS 


The literature associated with the foundation and analysis 
of the finite volume methods is extensive. This article 
gives a very brief overview of finite volume methods 
with particular emphasis on theoretical results that have 
significantly impacted the design of finite volume methods 
in everyday use at the time of this writing. More extensive 
presentations and references on various topics in this article 
can be found in the books by Godlewski and Raviart (1991), 
Kröner (1997), Eymard, Galluoét and Herbin (2000) and 
LeVeque (2002). 


6 RELATED CHAPTERS 


(See also Chapter 4, this Volume, Chapter 4, Chapter 11 
of Volume 3) 
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1 ARCHITECTURE OF MODELING 
SYSTEMS 


It is not easy to define a modeling system. A modeling 
system can be every system useful to model a 2-D or 
3-D object. Still many designers model with clay, hence 
from their point of view a modeling system would be 
pencil, paper, clay, and the designer himself. In the area 
of computer graphics, one is mainly interested in a virtual 
model of the object, which can be viewed from different 
perspectives, modified, and processed further, to simulate 
the behavior of the object in reality. Here a modeling system 
consists of the computer hard- and software and of the user. 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


Before one can choose or build a modeling system, one 
has to choose an appropriate model for the problem given. 
We will discuss different types of models in the following 
chapters but we will not go into detail on how to map a 
given real-world problem onto one of these models. Please 
refer to Koenderink (1990) for some insights on how to 
accomplish this. 

Today a strict separation of physical modeling, for exam- 
ple, with clay and virtual modeling with a computer, cannot 
be sustained, since many mixtures are used in practice. Clay 
modelers, for example, often use a 3-D scanner to create 
a virtual model and on the other hand virtual models can 
easily be printed with a 3-D printer to create 3-D proto- 
types. Recently, even stronger connections are made using 
haptical devices and 3-D glasses to enable the user to feel 
and see the object in 3-D space. 

Since the user is still the most important part of a 
modeling system, the interaction between the human and 
the computer plays a crucial role. Therefore, different 
hardware tools like scanners, printers, viewing, and input 
devices have been developed to interact with the user. The 
software is then needed to ensure the smooth interaction of 
all components. 

The software of a modeling system can be divided into 
four abstraction layers (see Figure 1): 


1. The user interface (UI) is the part of the software that 
interacts with the user directly. The UI is mostly graph- 
ical and presents the user with many options to create, 
modify, analyze, and view the object. Constructing a 
graphical UI is a complex venture by which not only 
the wishes of the user have to be taken into consid- 
eration, but also the possibilities of the hardware. It 
is important that operations being repeated very often 
do not consume too much time and that the user is 
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User interface 


High-level operation layer 


Low-level operation layer 


Hardware level interfaces 
(OpenGL, DirectX) 


Hardware related level 


Figure 1. Software levels of a modeling system. A color version 
of this image is available at http://www.mrw.interscience.wiley. 
com/ecm 


constantly informed about the status of any operation. 
An intuitive layout (of buttons, menus...) should also 
be kept in mind. 

2. The high-level operation layer hosts mainly complex 
operations like intersecting, cutting, modifying, ana- 
lyzing, and postprocessing of objects. These operations 
can be accessed through the user interface and can be 
understood to be the main modeling tools. They should 
be robust and efficient to supply the user with powerful 
tools enabling him to achieve every option he has in 
mind. 

3. On the low-level operation layer, the data structure 
is located together with its low-level operators. These 
operators provide the next higher level with the con- 
trolled access and modifying options of the data struc- 
ture. They keep the data in an organized state. Since the 
data structure and its operators are strongly connected, 
an object oriented programming language like C++ is 
predestined for the implementation. ` 

4. The hardware-related level is the lowest layer. Here the 

interaction with the input- and output-hardware devices 
is implemented. Sometimes it is necessary to directly 
program the hardware (driver programming, assembly 
language, etc.) to elicit the needed features, but most 
of the time it is sufficient to use existing drivers and 
interfaces (e.g. OpenGL or DirectX). 
Another important aspect that needs to be dealt with on 
the lowest layer is the precision of operations. Since 
floating-point arithmetic is only approximate, but not 
precise, small errors may accumulate and possibly lead 
to catastrophic failure. Therefore provisions have to be 
made to prevent this failure or an exact arithmetic has to 
be implemented, unfortunately leading to a slowdown 
of the entire system. 


We have seen that the data structure and its opera- 
tors form the heart of the modeling system (level 3). 


Therefore the data structure determines the feasibility and 
performance of the high-level operations. Many different 
types of data structures exist, each with its own advantages 
(and disadvantages). 

More on modeling systems (with an approach slightly 
different from the one presented here) can be found in 
Hoffmann (1989). 


2 VOXEL REPRESENTATION 


A typical volume-based approach in modeling is the voxel 
representation. Koenderink (1990) refers to these kinds of 
models as “sugar cube blobs” since these models can be 
thought as a set of sugar cubes glued together appropri- 
ately. This concept is a straightforward generalization of 
pixel graphics as known from computer graphics. Whereas 
in pixel representations a 2-D image is discretized into a 
set of squares with integer coordinates (the pixels), in voxel 
representations 3-D space is split into a regular cubic grid 
consisting of voxels. The easiest way of representing an 
object like this is to assign to each voxel a Boolean value, 
deciding if the volume described by the voxel is part of the 
object or not. Figure 2 shows a typical 12 x 12 x 12 rep- 
resentation of a full sphere. A problem of the voxel-based 
approach is that the approximation of the objects volume 
at low resolutions is usually relatively poor, while higher 
resolutions increase memory consumption at a cubic rate. 
As a compromise for voxels intersecting the boundary of 
the object, the Boolean values can be changed to fuzzy 
numbers depending on the volume of the intersecting part 
of voxel and object. Additional attributes as described later 
can also be assigned to voxels. 


Figure 2. Voxel representation of a full sphere. A color version 
of this image is available at http://www-.mrw interscience.wiley. 
com/ecm 
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Voxel representations make Boolean operations like inter- 
section or union of two objects extremely easy. Only the 
corresponding Boolean operations for their assigned voxel 
values need to be carried out, for example the logical AND 
for the intersection. Again this is relatively costly at a higher 
resolution due to the enormous number of voxels involved. 

The voxel-representation method can be viewed as a 
special case of the constructive solid geometry (CSG) 
technique discussed in Section 5 with only one primitive — 
the cube at integer coordinates - and one operator — the 
union. 

Voxel representation is also known as spatial-occupancy 
enumeration (cf. Foley et al., 1996). 


2.1 Octrees 


Voxel representations can become memory consuming if a 
greater level of detail is desired. Thus sometimes voxels 
are organized into octrees. These are trees where each 
node is of degree eight or zero. Octrees are obtained 
by starting with one voxel large enough to enclose the 
whole object, representing the root node of the octree. In 
general this voxel is a poor approximation of the object, 
therefore this voxel is divided into eight equal-sized smaller 
voxels, representing the child nodes of the root node. For 
each voxel, an approximation criterion is checked, for 
example if the voxel intersects the boundary of the object. 
If this criterium is met, it is subdivided further, otherwise 
subdivision is omitted. This process is repeated for those 
voxels that require further subdivision until the desired level 
of approximation is reached. 

To understand the way an octree is obtained refer to 
Figure 3. Here the octree’s 2-D analogue, a quadtree, for 
a triangle is constructed. The resulting quadtree is shown 
in Figure 4. Note that only squares (and thus nodes) con- 
tributing to a greater level of detail are to be refined in 
a following step. Hierarchical representation schemes like 
octrees make tasks like collision detection particularly easy: 
First the two root nodes need to be checked for intersec- 
tion. If and only if an intersection is found, the child nodes 
belonging to the respective objects are checked and so on. 
This is almost as easy as in the voxel case while being 
far more efficient. For an overview on how to implement 
Boolean operations for octrees, see Foley et al. (1996). For 


Figure 3. A quadtree for a triangle. A color version of this image 
is available at http://www.mrw.interscience.wiley.com/ecm 


Figure 4. Resulting quadtree for the triangle. A color version 
of this image is available at http://www.mrw.interscience.wiley. 
com/ecm 


Figure 5. Result of a marching cubes conversion. A color version 
of this image is available at http://www.mrw.interscience.wiley. 
com/ecm 


a comprehensive survey of octree-related techniques, see 
Samet (1984). f 

Both voxel and octree representations may require con- 
version to a boundary-representation before FEM com- 
putations can be carried out. This conversion can be 
accomplished using the famous marching cubes algorithm 
(Lorensen and Cline, 1987). Figure 5 depicts the result of 
such a conversion. 


3 SURFACE PATCHES 


Surface patches form the base for boundary-representation 
schemes. Therefore before we can discuss the foundations 
of boundary representations in Section 4, we need to know 
how to model a surface — the boundary of our object. 

We define a surface patch to be a connected 2-D manifold 
in 3-D, that is, a set of points in 3-D, where each inner point 
has a small surrounding neighborhood homeomorphic to the 
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2-D open disc. We define the boundary of this set to be part 
of the surface patch. 

There exists quite a huge variety of surface patches 
matching this definition and most of them are not easily 
represented in a data structure. Furthermore, we should 
keep in mind that surface patches are usually meant to 
be ‘sewn’ together (cf. Section 4) in order to form more 
complex surfaces, therefore they are mainly rather simple 
bounded and bordered manifolds. Nevertheless, we shall 
give no formal definition of simplicity but rather present a 
selection of commonly used techniques for implementing 
special classes of surface patches. 

For a detailed discussion of many of the subjects men- 
tioned in this section refer to Hoschek and Lasser (1993). 
Also refer to Koenderink (1990) for some deeper insights 
about patches. 


3.1 Polygonal patches 


Given a sequence of coplanar points pg, ..., p, in 3-D, we 
define the sequence of edges joining two points D;, Pi+ı 
plus the edge Pp, Po to be the closed polygon of the points. 
We define the geometric interior of the polygon to be the 
set of all points in the same plane that cannot be reached 
by an arbitrary path from a point far away in the same 
plane (e.g. from a point outside the convex hull of the 
polygon) without crossing the polygon. We will consider 
every geometric interior point plus the boundary of these 
points to be part of the polygonal patch. 

Of course this definition gives no efficient algorithm for 
testing, if a point belongs to the polygonal patch. There 
exists a variety of methods to do this (cf. Foley et al., 
1996), each meeting our definition in special cases (and not 
in others). For example, some efficient algorithms fail if the 
polygon is not simple, that is, possesses self intersections. 
Figure 6 shows different polygons with their geometric 
interior painted red. It is also often desirable to allow 
polygons to have inner-boundary components, that is, inner 
parts of the boundary that are not directly connected to 
the outer boundary. We will refer to these as inner loops 
(cf. Section 4). Figure 7 shows a rectangular polygon with 
two inner loops. Note that these polygons also match our 
definition of a surface patch. 


Figure 6. Geometric interior of polygons. A color version of this 
image is available at http://www.mrw.interscience.wiley.com/ecm 


Figure 7. Polygon with inner loops. A color version of this image 
is available at http://www.mrw.interscience.wiley.com/ecm 


Because every polygonal patch can be decomposed into 
a set of triangular patches, it is sometimes sufficient to 
consider only triangular patches, yielding its triangula- 
tion. These can be handled very efficiently especially 
inside—outside testing and various other calculations are 
easily carried out for triangles. Nevertheless since a tri- 
angulation is not unique for a patch, a chosen triangula- 
tion sometimes introduces biases into these calculations. 
Often elaborate meshing techniques yielding almost equi- 
lateral triangles need to be applied. Therefore more general 
schemes allowing also nontriangular patches should be 
carefully considered as well. Triangulation is a special case 
of meshing techniques. Figure 8 shows an example of an 
object composed of polygonal patches (here only triangles). 


Figure 8. Triangulated object. A color version of this image is 
available at http://www.mrw.interscience.wiley.com/ecm 
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3.2 Parametric surfaces 


Often we want the surface to be really curved instead of just 
(piecewise) planar like in the polygonal case. This can be 
achieved via parametric surfaces. Let D be a subdomain 
of 2-D space and let f : D > R? be a continuous map. 
Often we require f to be differentiable, mostly f will be 
a homeomorphism onto its image set f[D] and generally 
we assume the differential of f having maximal rank. 
We will call the pair (D, f) a parametric surface with 
parameterization f, the surface patch is represented by the 
image of f [1]. 

Note that we assume no further restrictions for D, allow- 
ing explicitly every planar polygon (in 2-D), even with 
inner loops. This is because it can be sometimes intricate 
to find parameterizations for special surfaces. For example, 
a ring-like structure as depicted in Figure 9 can easily be 
represented by the domain 


[-1, H x 1-1, 0\ [-5 5] x |-5 5] 


with parameterization 


_ max{[x], lyl} 


f@,y) = Tita yt i 


Nevertheless, most of the time D will be polygonal or ~ 
for convenience — the unit square. The spline surfaces dis- 
cussed in Section 3.3 are a popular example of parametric 
surface patches. An alternative method is to model using 
partial differential equations. This elaborate technique was 
developed by M. Bloor and M. Wilson at the University of 
Leeds and is described in Nowacki, Bloor and Oleksiewicz 
(1995). 


3.3 Spline surfaces 


It is well known that for a given set of 3-D points 


Po» --+5 Pn there is a unique polynomial curve 
n 
a(t) =) et! 
i=0 


Figure 9. Parameterization of a planar ring. A color version 
of this image is available at http://www.mrw.interscience.wiley. 
com/ecm 


with cg,...,¢, € R? such that a interpolates every point 
p;. We refer to the points c; as control points of a. 
Polynomial interpolants suffer from three major draw- 
backs: 


e They tend to form unexpected “swinging” curves that 
can move far away from the interpolation points. 

e Construction and evaluation of these curves are numer- 
ically unstable. 

e There is no intuitive interrelation between coefficients 
of a polynomial and the shape of the resulting curve 
or surface. 


Splines try to overcome all three problems by two basic 
techniques: 


e Use a type of curve, that is numerically and visu- 
ally more “tame”, that is, closer to its interpolation 
points. 

ə Compose the curve piecewise from subcurves. 


All different types of splines are obtained from piece- 
wise subcurves, they only differ by the base type cho- 
sen for these curves. Best known and widely used are 
Hermite-splines, Bezier-splines, B-splines and NURBS, and 
of course monomial-splines (where each subcurve is an 
ordinary polynomial curve). These are all curves of piece- 
wise polynomial type (and in the case of NURBS piecewise 
rational polynomial). Furthermore there are nonpolynomial 
types like trigonometric splines, exponential splines, or 
splines based on subcurves obtained from other subdivision 
processes (which are not necessarily polynomial), although 
these are more rarely used as they may be computational 
costly. 
Formally we will call a curve a spline (of degree 7) if 

it is 
1. piecewise composed of the same type of subcurve 

belonging to the same finite dimensional vector space 

of functions (note that the subcurve is in most cases 

c”), 
2. at least n — 1 times continuously differentiable. 


Note that depending on the type of spline chosen we often 
need additional control points besides the interpolation 
points to characterize the curve completely. 

Using the techniques from Section 3.2 one can easily 
obtain spline surface patches from spline curves. Given 
an array of control points (c,;) with i e€ {1, wy Mh, J € 
{1,...,m}, each sequence c,,,...,C,; defines a spline 
curve Qj, which evaluated at a certain point x yields a 
further sequence of control points a, (x), ..., Om (x). These 
form a spline B,, which can then be evaluated at a point y, 
thus giving a resulting range point. This process describes 
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Figure 10. Control array of a spline surface, 
a map 


F(x, y) := BO) 


where f in fact is a parameterization of a surface patch. 
Figure 10 shows a control array and the underlying spline 
surface. 

For a detailed overview on spline techniques see, for 
example, Farin (1993), Hoschek and Lasser (1993), and 
Yamaguchi (1988). A recent reference can be found in 
Patrikalakis and Maekawa (2002), this book also deals 
with problems of spline surface intersections, which are 
important when splines are combined with CSG represen- 
tations (cf. Section 5). Certainly the most elaborate spline 
technique is the usage of NURBS (nonuniform rational B- 
splines); see Piegl and Tiller (1995) for a comprehensive 
overview. 


Figure 11 shows a wrench modeled with piecewise B- 
spline patches. 
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3.4 Trimmed surfaces 


We have already seen in Section 3.2 that the domain of a 
parametric surface is not necessarily the unit square. We 
can generalize this principle by trimming polygonal and 
even nonpolygonal (e.g. bounded by splines) subdomains 
from parameter space and thus trimming the surface patch 
itself. Depicted in Figure 12 you find a sequence in which 
a user selects a closed curve in parameter space (green), 
which is then trimmed out of the surface patch (black). 
Note that trimming the surface patch directly (instead of 
the parameter domain) is an even more complex task 
since it involves the computation of the inverse f7! of 


the parameterization f (c.f. Patrikalakis and Maekawa, 
2002). 


3.5 Multiresolutional approaches 


As we have seen in Section 3.3, splines bring great improve- 
ments in curve and surface design. Nevertheless, modern 
design and modeling applications may demand further fea- 
tures of a surface representation that are in detail (cf. 
Stollnitz, DeRose and Salesin, 1996): 


e easy approximation and smoothing of a representation, 
gained from external sources (i.e. scan points from a 
digitizer) 

e changing the macro-scale appearance without chang- 
ing the micro-scale appearance (the “character”) and 
vice versa 

ə» edit a representation on various, preferably continuous, 
levels of detail 


These issues can be addressed by using a relatively new 
idea, the so called B-spline wavelets instead of ordinary 
B-splines for curve and surface modeling. 


Figure 11. Wrench composed of B-spline patches. A color version of this image is available at http://www.mrw.interscience.wiley. 


com/ecm 
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Figure 12. Trimming of a surface patch. A color version of this image is available at http://www.mrw.interscience.wiley.com/ecm 


Figure 14. Some B-spline wavelets. (Reprinted from Wavelets 
for Computer Graphics: Theory and Applications, Stollnitz E.J., 
DeRose T.D. and Salesin D.H., The Theory of Multiresolutional 
Analysis: Biorthogonal Wavelets, (1996), p. 96, with permission 
from Elsevier.) 


The idea of a wavelet representation is to represent a 
curve using linear combinations of functions given in dif- 
ferent levels of detail. The Haar base functions shown in 
Figure 13 are the best known examples. Thus we get a rep- 
resentation in which a manipulation of a single coefficient 
has only relatively local impact, depending on the level. 
Since B-splines also form a vector space, wavelets can be 


built from them. Figure 14 depicts B-spline wavelets for 
degree 0 up to degree 3 for the third level of detail (thus 
giving 23 = 8 functions per degree). For a comprehensive 
overview on wavelets from an analytical point of view, 
refer to the book by Mallat (1998). 


4 BOUNDARY REPRESENTATION 


A boundary representation (B-rep) of a solid describes the 
solid only by its oriented boundary surface. The orientation 
is needed to decide easily which side of the boundary is the 
top side and which is the bottom side (even if the object 
is not closed). Since a normal vector is known everywhere 
B-rep solids can be visualized very easily, 

Generally it is possible to usc a variety of different 
surface patches to model the boundary. These patches (e.g. 
NURBS-patches, parameterized surfaces, or simply planar 
polygons, see Section 3) have to be connected with each 
other at their boundaries. The orientation must not be 
destroyed during this step. 

In most applications, planar polygons are used as patches 
(very often only triangles are permitted). These patches are 
called faces. Their borders consist of vertices and edges. 
Different data structures have been developed to hold the 
information necessary to create and work with a B-rep 
solid. The location and orientation (normal vector) of a 
plane containing a face has to be known and also the 
correspondence of the vertices and adjacencies of the edges 
and faces need to be controlled. 

The boundary representation of a solid therefore has two 
parts: 


e the topological description of the connectivity and 
orientation and 

e the geometric description to place all elements in 
space. 


To understand how the topological data are maintained 
and by what means a topological integrity can be ensured, 
it is useful to understand planar models and Euler 
operators first. Later on a half-edge data structure is 
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Figure 15. Planar model of a tetrahedron. 


introduced as an example on how to implement a B-rep 
model (see Chapter 7, this Volume). 


4.1 Planar models 


Planar models are useful to represent the topology of a 
solid. A planar model is a planar oriented graph {F, E, V} 
consisting of a set of faces F = { fi, f,...}, edges E = 
{e,,@,...}, and vertices V = {v}, v),...}. Every edge has 
its orientation. If different faces share the same edges or 
vertices, they have to be identified with each other. Identi- 
fied edges must show in the same direction. In other words, 
the directions of the edges imply the way they have to 
be identified. Note that with the half-edge data structure 
described in the next chapter, one edge always consists of 
two half-edges showing in opposite directions. Here we 
only have a single edge, which can appear several times in 
a planar model. Figure 15 shows an example of the planar 
model of a tetrahedron. Figure 16 shows a model of the 
torus and the Klein bottle. The only difference between the 
two models in Figure 16 is that one of the edges e, points 
in the opposite direction. This results in a different identifi- 
cation. The two models therefore describe different objects. 

A solid can have different planar models. An example can 
be found in Figure 17, where two different planar models 
of the sphere are presented. 

From a planar model, the Euler characteristic can be 
calculated quickly: x = |V| —|E|+ |F]. Here it does not 
matter which particular planar model of a solid is used. The 
Euler characteristic of the tetrahedron is yp = 4 — 6 +4 = 
2, the characteristic of the torus and Klein bottle (Figure 16) 
is Xg =1—2+1=0, and of the sphere (Figure 17) is 
Xp =1—-142=2-242=2. 

Based on the planar models, a set of topological operators 
was developed to manipulate models in a way that leads 
to all models of physical significance while making sure 
that only feasible models can be created. For instance, 
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Figure 16. Planar model of the (a) torus (b) Klein bottle. 
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Figure 17. Planar models of the sphere. 


nonorientable models cannot be created. These topological 
operators are split into two classes, the local and global 
operators. 

Local operators work on planar models without modify- 
ing the Euler characteristic while global operators can create 
objects of higher genus (e.g. double torus), thus changing 
the Euler characteristic. A detailed description and proofs 
of the properties of these operators can be found in Mäntylä 
(1988). 
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4.2 Half-edge data structure 


In order to work with a B-rep model, one needs to construct 
a data structure, combining geometric and topological data. 
The probably oldest formalized structure, the so called 
winged-edge data structure, was introduced by Baumgart 
(1975). The half-edge data structure is a variation by 
Mäntylä (1988) that permits multiple connected faces and 
sustains a close relationship to the planar models. 

The half-edge data structure (as depicted in Figure 18) 
utilizes the fact that each edge of the boundary surface of 
a closed solid belongs to exactly two faces. So every edge 
is split into two half-edges that are oriented in opposite 
directions. Every face has exactly one outer boundary (outer 
loop) consisting of counterclockwise oriented half-edges 
(if viewed from above) and possibly further inner loops 
consisting of half-edges that are oriented clockwise. The 
orientation of the loops makes it possible to determine the 
top and the bottom side of each face. All vertices of a 
face have to lie on the same plane and are saved in 3-D 
homogeneous coordinates. Every vertex has to be unique 
and can be referenced by many half-edges (depending on 
how many faces share that vertex). Since all half-edges 
know their neighbor and their parent loop, which again 
knows the parent face, finding neighbour faces and iterating 
through the data structure is quite easy. 

A set of low- and high-level operators (the so called Euler 
operators) can be derived from the topological operators of 
the planar model (see the previous section). This permits 
operations on the data structure in an ordered manner. Any 
further operators can be implemented using the Euler oper- 
ators, thus granting the technical feasibility of the modeled 
object (see Chapter 17 and Chapter 18 of this Volume). 


5 CONSTRUCTIVE SOLID GEOMETRY 


One of the best known volume-based approaches to model- 
ing is the CSG approach. Again refer to Foley et al. (1996) 
or Hoffmann (1989) for an overview on the subject. 

In CSG, every object is either one of a set of simple 
objects, the primitives or it is derived from these by a 
sequence of operations. Various CSG schemes exist. They 
are different with respect to their sets of primitives and 
Operations. In 3-D modeling, the most commonly used 
primitives are: 
ball, 
cylinder, 
box, 
cone. 


Further possibilities include surfaces of revolution, implicit 
bodies, and boundary-representation objects. This shows 
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Figure 18. Half-edge data structure. 


that CSG can be combined with other methods {discussed 
above) to gain a greater variety of primitives. 
A suitable set of operations must include: 


Euclidean motions (translation, rotation) 
union, 

intersection, 

difference. 


The latter three are called regularized Boolean operations 
because they are analogous to the well-known Boolean set 
operations with a slight difference we will discuss later. 
Let us first consider the example shown in Figure 19. The 
object on the left side is composed of the primitives on 
the right side via the union operation. Note that parts of the 
objects located inside other objects are ‘swallowed’, so that 
there are no more overlaps or double points. 

Internally composite objects are kept as binary operator 
trees. Figure 20 shows one of such trees, U denotes the 
union operator. Obviously neither the sequence of oper- 
ators nor the resulting tree is unique for a given result. 
Nevertheless, by this way, CSG keeps a kind of history of 
the construction steps, hence every complex object can be 
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Figure 19. An object composed of CSG primitives. A color version of this image is available at http://www.mrw.interscience.wiley. 


com/ecm 


Figure 20. CSG tree for the bird object. A color version of this 
image is available at http://www.mrw.interscience.wiley.com/ecm 


decomposed into primitive parts again. This is not possible 
with most other methods. 

Figure 21 shows another example, a wrench composed of 
the primitives shown on the right. Formally the regularized 
Boolean operators are defined as follows: Given two objects 
A and B and a Boolean set operator °. The result of the 
corresponding regularized Boolean operator © is defined to 
be ASB := A°oB°, where A® denotes the interior of A and 
A denotes the closure. This definition avoids problems of 
Boolean operators giving results that do not represent 3-D 
objects. For example, the intersection of two adjacent boxes 
sharing one side would yield just that side as a result, giving 
a non 3-D object. 

Sometimes further operations are included like non- 
Euclidean matrix operations (scaling, skew transforms) or 
surface sweep operations. One has to take care that these 
additional operations are applicable for the given primi- 
tives, for example, it is mostly impossible to apply sweep 
operations to objects given in implicit representations while 
keeping their implicit representation. 

CSG is best suited for ray tracing and other applica- 
tions that require only inside—outside testing as these can be 
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Figure 21. A CSG wrench model. A color version of this image 
is available at http://www.mrw.interscience.wiley.com/ecm 


easily carried out in a CSG representation. Also voxel repre- 
sentations can be gained easily from a CSG representation. 
On the other hand, CSG is not easy to deploy in situations 
that require a meshing of the given object, for example, 
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for FEM computations since this demands the elimination 
of unnecessary (e.g. ‘swallowed’) object parts first which 
can be costly. A method that avoids these computations 
is to apply the marching cubes algorithm (Lorensen and 
Cline, 1987). This only requires that it can be checked for 
an arbitrary point if this point is inside the interior of an 
object, which is easily possible for a CSG representation. 
One major drawback here is that we might lose important 
details of the model..Another method would be to mesh 
each primitive separately (most of the time, it is relatively 
easy to give a mesh for each primitive) and join these (this 
is the difficult part) to form the mesh. 


6 MEDIAL MODELING 


Medial modeling is the newest one among the modeling 
concepts presented in this survey paper. Medial modeling 
essentially uses past and ongoing research of the Welfenlab 
being the authors research lab at the University of Han- 
nover (cf. Wolter and Friese (2000) for a brief overview 
of results). The suggestion to use the medial axis concept 
as a tool for shape interrogation appears to have been dis- 
cussed first quite extensively by Blum (1973). For a detailed 
mathematical analysis of the underlying mathematical con- 
cepts (in the generalized context as cut loci), refer to Wolter 
(1979, 1985, 1992). The latter paper presenting an extended 
analysis of mathematical foundations of the medial axis 
contains also early results (e.g. the one stated below in for- 
mula 1) indicating already the possibility to employ the 
medial axis as a geometric modeling tool. The latter aspect 
will be a subject discussed in this section. 

One could perhaps snmmarize the most relevant points 
of medial modeling as follows: In medial modeling, a solid 
is described by its medial axis and an associated radius 
function. The medial axis being a subset of the solid is 
a collection of lower dimensional objects. For a general 
3-D solid, the medial axis mostly consists of a connected 
set built by a collection of surface patches and curves. 
Medial representations often simplify the process of gaining 
volume tessellations for the given object, supporting the 
meshing of solids, cf. Section 6.4. They also offer new 
possibilities for the construction of intuitive haptic user 
interfaces that are useful to mold a solid’s shape. 

The basis of medial modeling can be summarized in a 
few geometric observations, ideas, and definitions that are 
outlined here. Let K be a solid in the 3-D or 2-D Euclidean 
space. The medial axis M(K) of K being a subset of the 
solid contains all points in K that are centers of maximal 
discs included in K. One usually includes in the medial 
axis set M(K) its limit points. Figure 22 shows the medial 
axis (green) of a domain (black) with some maximal discs 


BS 


k 


Figure 22. Medial axis of a domain. A color version of this image 
is available at http://www.mrw.interscience.wiley.com/ecm 


given (red). We assign to any point p in the medial axis 
M(K) the radius r(p) of the aforementioned maximal disc 
with center p and radius r(p). This disc is denoted with 
Brip) (p). The pair (M(K), r) described by the medial axis 
M(K) of a solid K and the associated maximal disc radius 
function 


r: M(K) > R* 


is called medial axis transform, where R* denotes the 
nonnegative real numbers. This pair (M(K),r) yields a 
new possibility to represent the solid K simply as the union 
of the related maximal discs, that is, 


K= U BP) 0) 


peM(K) 


For details see Wolter (1992). Figure 23 shows how the 
union of maximal discs defines the shape of a planar 
domain. The general reconstruction statement expressed in 
equation 1 already holds for solids with merely continuous 
boundary surfaces. However, if the solid has merely contin- 
uous boundary surfaces (or continuous boundary curves for 
solids being closed 2-D domains) then the medial axis may 
have a “wild” structure that may, for example, be presented 
by a set being dense in some open 3-D sets (containing 3-D 
discs). 


Figure 23. Maximal discs defining a shape. A color version 
of this image is available at http://www.mrw.interscience.wiley. 
com/ecm. 
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In case one poses some regularity conditions for the 
solid’s boundary surface 0K, for example, curvature con- 
tinuity, then the respective medial axis M(K) will have a 
more benign structure. For instance if 3K is curvature con- 
tinuous, then M (K) will not be dense in any open set in R. 

Let us assume that 9K is built by a finite collection 
of finitely many B-spline patches, then M(K) could be 
constructed by a union of finitely many medial sets. Each 
of the latter consisting of points being equidistant to two 
appropriately chosen boundary parts. Hence each medial set 
can be viewed as subset of zero sets defined implicitly by 
the condition stating that the difference of the distances of a 
point in the medial set to the respective parts of aK is zero. 

This insight can be used to develop strategies to compute 
(approximately) the medial axis by assembling it from 
medial sets (see Figure 24). In case the boundary parts are 
given implicitly by solutions to polynomial equations, then 
the medial sets can be described in principal by implicit 
polynomial equations as well. 

The reconstruction result stated above in equation 1 can 
be used also to model shape for families of objects. Clearly, 
in equation 1, the shape of the object depends on the medial 
axis set M(K) and the function r, 

Intuitively a continuous deformation M(K),, t € R, of 
the medial axis M(K) = M (K)o combined with a continu- 
ous change of the function r described via a continuous 
family of functions r(t, s): M(K), > R+ with r(0, 0): 
M(K) — R (with r(0, 0) = r) should yield a continuously 
deformed family of objects 


Ku = U Basin) 
peM(K), 


The two control parameters f, s indicate that the change of 
the radius function r(t, s) depends on the chosen respec- 
tive domain of definition controlled by the parameter t and 


Figure 24. Assembled shape. A color version of this image is 
available at http://www.mrw.interscience.wiley.com/ecm 


Figure 25. Continuous deformation of an object. A color version 
of this image is available at http://www.mrw.interscience.wiley. 
com/ecin 


for a fixed domain of definition M(K), the radius function 
depends on the parameter s. Figure 25 shows such a defor- 
mation. Note that the medial axis and the radius function are 
both modified. In order to present a well-defined concept 
for the continuity of the deformation outlined here, we need 
some formal requirements that are caused by some com- 
plications that may occur during the deformation process. 
We observe that a continuous (differentiable) deformation 
of the solid’s boundary may result in a family of medial 
axes M(K), whose homeomorphic type may change during 
the deformation process, Such a metamorphosis will occur 
when a (new singularity) curvature center of the bound- 
ary will meet the family of medial axes occurring during 
the deformation process of the solid. See Figure 26 for an 
example. Therefore it makes sense to consider continuonsly 
changing families of functions r(s,t) under the provision 
that for a varying parameter s the domain of the func- 
tion family, here the medial axis set M(K),, should be 
fixed. This will allow to consider (for a fixed parameter 
tọ and a variable parameter s) the family of continuous 
functions r (to, 5) : M(K),, —> Ras a continuous path in a 
vector space of real-valued functions defined on the com- 


pact set M(K),,. That vector space will be endowed with 


an appropriate topology or norm. Here, fixing the domain 
M(K), makes it easy to define a distance between two 
radius functions r (tg, $4) and r(fy, 52) by 


Atos s1)» 7 (los s2) = max, {Io 5)(P) 
to 


— r (fo, 52)(P)I} 


Center of curvature Center of curvature 


Figure 26. Nonhomeomorphic deformation of a medial axis. 
A color version of this image is available at http://www.mrw. 
interscience.wiley.com/ecm 
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In order to express the continuous deformation of the medial 
axis family in a formally precise setting, we need to endow 
the collections of ali medial axes with a topology as well. 
Here it makes sense to use the Hausdorff metric dy(-, -) 
defined on all compact subsets of the respective 2-D or 
3-D domain. Let A, B C R? then 


dy, (A, B) := inf{e: A © N,(B) and B C N,(A)} 
with 
N,(A) := {y : |x — y| < £ for some x € A} 


A continuously deformed family of medial axes (depending 
on a parameter t) can now be viewed as a continuous path 
¢ in the Hausdorff space H,„ of compact sets in R? or R°. 
Here we have 


$t) :R* > H ={A CR’: A compact) 


Examples may be given here by families of spline patches 
controlled by continuously moving control points c;(¢); cf. 
Section 3.3. 


6.1 A metric structure for medial modeling 


In the previous setting, we compared radius functions 
only in the simplified special case in which they were 
defined on a common medial axis set. It is desirable to 
formulate the continuous change of the medial axis set 
together with the change of the radius function in a common 
topology. For this purpose, it is also possible to consider the 
preceding continuous deformation concept as a whole being 
describable within a general setting employing Hausdorff- 
topology and spaces of functions endowed with appropriate 
topologies. For this we define a metric on the product space 
built by the product of the two spaces H x F, one of them 
being the above Hausdorff space 


H = {A : A compact subset of R? N B,(0)} 


(H, dy) being endowed with the Hausdorff metric dy 
defined above. The other space F in the product H x F is 
given by the space of all continuous real-valued functions 
defined on the compact set B,(0). On the latter space of 
continuous functions we can define a metric 


4s(f,g) = max {|f (x) — g(x)I} 
xeB, (0) 


for any pair of continuous real-valued functions f, g defined 
on B, (0). 


Tn this context, it is quite important to understand that any 
continuous function defined on a compact subset A C B,,(0) 
can be viewed as an restriction of an appropriately cho- 
sen function being continuous on all B,(0). This holds 
here since the space B,(0) C R? fulfills appropriate sepa- 
ration properties; see also T} axiom of Hocking and Young 
(1988) [2]. Clearly the metric on the product space is now 
defined by 


d,({A, rı), (B, r2)) = dy(A, B) +d, (r1, r2) 
It can be shown that if 


dallAn rn) (Aoro) ~~ 0 then 


du | U B® || U Ba) } 9 


PEAn p&g 


The sequence of objects (each of which modeled by the 
union of discs) converges in the Hausdorff metric to the 
related limit object. Unions of discs obtained from members 
of a sequence of medial axis transforms converge against 
the discs union of the limit (medial axis transform). Clearly, 
if W(t) = (A(t), r,) is a continuous deformation path in 
(H x C) with (Ap, ro) being the medial axis transform of a 
solid, then for the respective discs unions related to y(t) we 
have Hausdorff convergence toward the solid corresponding 
to (Ag, Fo). 


1. However note that not every pair (A,r) will define a 
solid via the union (U, ea Brp (P)) 

2. Incase (Upea B,(p)(P)) defines a solid it may not have 
A as medial axis andr : A > R may not be a maximal 
disc radius function. 


Examples in the context of statement 1 above may be 
constructed easily in case we use a radius function r that 
may attain the value zero as then parts of the object might 
agree with the axis A that may be chosen deliberately 
wild. In case we assume that the radius function r > 0, 
then we may still have delicate situations in which the 
boundary of a domain obtained from a union of closed discs 
may at some points be locally homeomorphic to two arcs 
having tangential contact of a single point (see Figure 27). 
Figure 28 shows an example illustrating the claim in 2. 
Here Upea Brp (p) defines a solid whose medial axis 
contains a topological circle while A does not. 


6.2 Boundary representation in medial modeling 


So far the Medial Modeling concept has been built on the 
idea to mold the solid by a union of discs. It may be prefer- 
able to represent the respective solid rather by appropriate 
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Figure 27. Tangential contact of envelopes. A color version 
of this image is available at bttp://www.mrw.interscience.wiley. 
com/ecm 


Figure 28. Self-intersection of envelopes. A color version of this 
image is available at http://www.mrw.interscience,wiley.com/ecm 


boundary surfaces cf. Section 4. The latter ones arise quite 
naturally in the medial modeling context. Here the bound- 
ary surface (curve) is created as the envelope surface of the 
family of discs belonging to the specific medial axis trans- 
form (see Figure 29). Let us assume that the medial axis is 
presented locally by a differentiable curve or surface patch 
being presented by parametric functions m(u), with the 
radius function r (u) depending on the parameter u as well. 

It is possible to express the envelope surface using func- 
tions en(u) in terms of expressions involving m(u), m'(u), 
r({u),7’(u). It is also possible to compute en’(u) and the 
curvature of the envelope curve. The latter computations 
need higher-order derivative information of the functions 


representing the medial axis and of the radius function; 
refer to Wolter and Friese (2000). 


Figure 29. Construction of the envelope. A color version of this 
image is available at http://www.mrw.interscience, wiley.com/ecm 


Employing the concepts outlined above, different systems 
have been developed at the Welfenlab that can compute 
the envelope surface yielding a boundary representation 
of a solid whose medial surface and whose radius func- 
tion have been given. More precisely the aforementioned 
medial modeling system computes for a parametric spline 
surface patch m(u) : [0, 1] x [0,1] > R? and for an asso- 
ciated radius function r(u) the boundary surface of the 
corresponding solid whose medial surface is given by 
m ([0, 1] x [0, 1]) being a deformed rectangle embedded 
without self intersections into R>, cf. Figure 30. Figure 31 
illustrates the simplified special case in which m : [0, 1] > 
R? is a planar arc and the now 2-D solid corresponds here 
to a planar domain. At those positions where the center 
points of the maximal discs are located on the boundary of 
the medial patch, we get the related boundary surface of 
the solid from appropriate parts of maximal spheres. Here 
the construction (using the modeler) is valid if the normal 


Figure 30. A medial patch inside its associated solid. A color 
version of this image is available at http://www.mrw.interscience. 
wiley.com/ecm 
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Figure 31. The 2-D case. A color version of this image is 
available at http://www.mrw.interscience.wiley.com/ecm 


segment (joining the point env(u) of the envelope surface 
with the medial axis point m(u)) does not meet a curvature 
center of the point env(u) of the envelope surface prior to 
meeting m(u). Using the curvature formula for the envelope 
surface mentioned above and some additional criteria, it is 
easily possible to check if the radius function is admissible. 
This means that the previously stated curvature center con- 
dition must hold. Under those assumptions, it can be shown 
that the related envelope surface that we assume to be free 
of self intersections yields the boundary surface of a solid 


U Bg (m(u)) 


m(u)ém({0, 1]x[0,1]) 


with m([0, 1] x [0,1]) being the medial axis of the solid 
being homeomorphic to a 3-D cube. 

This result can be generalized to situations where the 
medial axis is built by a connected finite collection of 
patches. Again that collection of patches denoted by A will 
constitute the medial axis of a solid whose boundary surface 
is given by the envelope surface obtained via the disc 
radius function being defined on the collection of patches. 
Again we must assume that the envelope surface has no 
self intersections and that the above-mentioned curvature 
center assumption holds for the envelope surface. 

The situation in which the medial axis is built by a col- 
lection of patches is far more complicated than the case in 
which the medial axis is given by a single patch. Therefore, 
we shall not go into a detailed discussion on this case in this 
survey paper. Suffice to say, in order to deal with that com- 
plicated case, envelope surfaces related to adjacent medial 
patches are joined along curves. The geometry of the inter- 
section of medial surface patches, here the angles between 
intersecting medial patches at an intersection point, poses 
conditions that can be used to appropriately blend adjacent 


envelope surfaces that are related to adjacent medial sur- 
face patches. These blended envelope surfaces are used to 
construct the boundary surface of a solid containing the 
aforementioned medial surface patches. 


6.3 Medial modeling and topological shape 


One of the major reasons why the medial axis is so 
important for the shape of a solid is because it essentially 
contains the homotopy type of a solid because it is a 
deformation retract of the solid; refer to Wolter (1992) and 
Wolter and Friese (2000). The following topological shape 
theorem of the medial axis applies: The Medial Axis M(D) 
contains the essence of the topological type of a solid D. 

Let 8D be C?-smooth (or let 3D be 1-D and piecewise 
C?-smooth, with D C R*). Then the Medial Axis M(D) is 
a deformation retract of D, thus M(D) has the homotopy 
type of D. 

The proof of this theorem shows that it is possible to 
define a homotopy H (x, t), as explained below the next 
figure, describing a continuous deformation process of the 
solid D. This deformation process depends on the time 
parameter t. The deformation starts with the solid. In 
Figure 32 this is a rectangle with a circular hole. During the 
deformation, points are moved along the shortest segments 
starting at the solid’s boundary dD nntil the segments meet 
the dotted Medial Axis. The shortest segments are indicated 
by arrows in Figure 32. 

We describe a homotopy 


H (x, t): (D\Ə D) x [0, 1] > (D\aD) 
such that 


H(x,0) =x Yx e D\dD 
H(x,t)=x Yx e M(D) 
Hx, 1) = R(x) with R: D\8D —> M(D)\aD 


For this we define the homotopy as follows: 


A (x,t) := x + td(x, y(x) Vd (aD, x) 


Figure 32. Deformation retract. 
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Here d(x, y) denotes the function describing the distance 
between variable points x, y; Vd(x, y) describes the gra- 
dient of the distance function d(x, y). (x) is defined as 
point where the extension of a minimal join from aD to 
x € (D\@ D) meets M(D). 


6.4 Medial modeling and meshing of solids 


In the preceding section on medial modeling, we out- 
lined geometrical concepts that were used to explain the 
deformation retract property stated in the topological shape 
theorem. We outlined also how to look at the solids bound- 
ary surface as an envelope surface that can locally be 
presented by a nonsingular parameterization map defined 
on the medial axis, (cf. Wolter and Friese (2000) for more 
details). All those geometric considerations immediately 
lead to insights explaining that the medial axis concept can 
be used nicely as to construct for the given solid a mesh- 
ing partition that is naturally associated with the solid’s 
medial axis. 

We shall outline possibilities to use the medial axis for 
the meshing of solids by sketching some examples pre- 
sented subsequently in several figures further down. In this 
context, some observations are relevant. In case the solid 
S is created with the medial modeler say with a medial 
axis being diffeomorphic to a square Q then we immedi- 
ately obtain a quite simple parameterization of the solid. 
That parameterization map can be described by a differen- 
tiable map being defined on a solid PS containing all points 
in 3-space whose Euclidean distance to the unit square 
Q in the xy-plane is not larger than one. Here we iden- 
tify the latter unit square with the parameter space of the 
medial axis surface. Our definition of the parameterization 
map is essentially obtained from the differentiable function 
env(u) describing the envelope surface being the solid’s 
boundary surface, (cf. Section 6.2). The respective param- 
eterization map of the solid S maps an Euclidean segment 
in PS (joining any point u in the interior of Q orthogo- 
nally with the boundary of PS) linearly onto an Euctidean 
segment in S. The latter segment joins the medial axis 
point m(u) with one of the two corresponding boundary 
points env(u) in S. This segment in S meets the boundary 
surface of S orthogonally, (cf. Section 6.2). The outlined 
parameterization map of the solid yields a differentiable 
map f from PS onto S with a differentiable inverse f me 
Figures 30 and 33 show the cortespondence between the 
PS and S. In the simplified (lower dimensional) case, the 
solid S is a 2-D domain with its medial axis being now an 
arc instead of a 2-D surface. The maps f and f—! can be 
used to map certain convex sets in PS onto convex sets 
in S. This can be used to get a partition of an approxima- 
tion of the solid S into convex subsets. Figure 34 shows 


Figure 33. Medial axis of a solid. A color version of this image 
is available at http://Avww.mrw interscience.wiley.com/ecm 


(a) (b) 


Figure 34. Meshes obtained from a medial axis representation. 
A color version of this image is available at http://www.mrw. 
interscience.wiley.com/ecm 


examples where fairly complicated engineering objects that 
have been modeled with our medial modeling system have 
now obtained tetrahedral meshes that have been constructed 
employing the geometrical concepts that were explained 
above. 


7 ATTRIBUTES 


Sometimes there is a need to store additional information 
associated with a geometric model. We have already seen 
such an example: topological information in a boundary 
representation, in this case adjacency information. There is 
a variety of other data that can be associated to a model, we 
will refer to all of this as attributes of the model. These can 
be attributes of physical origin, which alter the reception of 
the object by the user, or logical attributes, which relate the 
object to other objects or data. Physical attributes include 
photometric, haptical, and other material constraints, such 
as elasticity or roughness. ý 
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7.1 Textures 


If attributes are quantifiable (which is true for most of 
the physical attributes), then they are often specified by 
textures, which are functions that relate surface points to 
certain quantities of the attribute. Formally a texture is 
defined by: 


tM—ovV 


where M is the set of points of the model and V is the set 
of possible values of the texture. M can consist either of 
the entire volume of the model or only the surface points 
depending on the nature of the attribute. 

Normally textures are implemented using two maps 


-R2 
p:R? > M; 


and 
vR > V 


with p being the well-known parameterization of the sur- 
face points Mg. Then the texture ż is given by 


tivo po 
Note that in practice one does not need to compute the 
inverse of p since the coordinates in parameter space (here 
identical to the texture coordinates) are known during the 
process of painting. Nevertheless, often it is rather difficult 
to find appropriate (i.e. nonsingular) mappings from the 
plane onto a given surface (in fact it is impossible as stated 
by the Hopf index theorem). This results in distortions of 
the texture near the singularities. To avoid this, one can use 
solid textures (cf. Peachey, 1985; Perlin, 1985). Here the 
map v is defined as 


vRo V 


and thus t := v since M C R°. Note that while ordinary 
textures are implemented as pixel images, solid textures are 
represented by voxel spaces (cf. Section 2). This approach 
is slightly faster and avoids distortions induced by the 
parameterization, on the other hand it consumes far more 
memory. Furthermore, ordinary 2-D textures can often be 
easily derived from photographs whereas this is much more 
complicated for solid textures; see for example De Bonet 
(1997). 

Solid textures are easily applied in areas, where the 
texture data itself result from real-world data, for example, 
a spatial scan of a material density or the like. Nevertheless, 
modeling a spatial texture can be intricate. The approach 
presented in Biwas, Shapiro and Tsukanoy (2002) shows 


how to combine traditional modeling techniques like CSG 
with the theory of distance functions to model arbitrary 3- 
D textures. For each textural attribute (here referred to as 
feature), its extremal sets are modeled as separate solids, 
then the gaps in between are filled via distance interpolation 
methods. A slightly different and more general approach 
can be found in Jackson et al. (2002) and Liu et al. (2003). 
These techniques are commonly referred to as heterogenous 
or inhomogenous modeling. The texture can then be kept 
in its quasicontinuous representation to benefit from the 
representations superior analytic properties, -or it can be 
easily converted to a voxel space representation for faster 
rendering etc. 

The most common use for textures are photometric 
textures, which are maps that modify the color of a surface 
point. Figure 35 shows a sphere, a photometric texture 
resembling marble and the sphere “wrapped” with the 
texture. 

Note that the use of textures is not limited to color 
(although this is assumed widely in the literature), other 
common uses include bump maps, which are vector fields 
that alter the direction of the point normal (and thus alter- 
ing the appearance of the surface locally near the point). 
Figure 36 shows a texture and the result of bump-mapping 
it onto a sphere. Note that not the sphere’s geometry itself 
is changed but only the face normals. 


Figure 35. Photometric marble texture. A color version of this 
image is available at http://www.mrw interscience. wiley.com/ecm 


a4 


Figure 36. Bump-map on a sphere. 


(b) 
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Textures are especially useful to model micro-scale 
aspects of surfaces, where detailed polyhedral modeling 
is too costly. For example, a micro-scale roughness of a 
stone surface can more efficiently be simulated by pho- 
tometric and haptical textures than by subdividing the 
(macro-scale) smooth surface into tiny triangles. Further- 
more, material properties like elasticity or particle density 
can be represented by 3-D textures. Rather than simu- 
lating the position of single, individually invisible parti- 
cles, a quasicontinuous texture is applied to the model 
space. 


7.2 Model parameters 


A special class of attributes are the model parameters. As 
we have seen in the preceding chapters, most model repre- 
sentations have a set of parameters associated with them, 
for example, the set of control points for a spline patch. 
Sometimes it makes sense to view some of these param- 
eters as attributes. Additional parameters can be added 
to most models, these include Euclidean motion matrices, 
stiffness constraints, and the like. It is then sometimes more 
appropriate to allow these parameters to change over time, 
making them effectively attached functions rather than con- 
stants. These techniques lead to the theory of physics-based 
modeling, refer to Metaxas (1997) for a comprehensive 
overview of this topic. 


7.3 Scripts 


An example of logical attributes are scripts. These are 
parts of code or methods that can be invoked when cer- 
tain constraints of the model are met. For example, a 
3-D object can have scripts attached that react to user 
interaction, when the object is selected in an interac- 
tive scene. Another example would be a script that is 
activated on collisions of the object with other parts of 
the scene. Scripts are especially useful in applications 
like physical modeling, where the modification of one 
object may require also modifications to associated objects. 
Rather than attributing these dependent objects to the call- 
ing object passively and letting the main program do the 
work, the tasks are carried out directly by the objects 
involved. 

The idea of scriptable attributes has now been around 
for several years without finding a broad acceptance. Nev- 
ertheless there have been some prototype implementations 
like the Odyssey Framework (cf. Brockman et al., 1992; 
Cobourn, 1992). 


8 OUTLOOK AND CONCLUDING 
REMARKS 


A theoretically and practically difficult topic that we barely 
touched upon in this paper considers aspects related to the 
analysis and computation of singularities of geometric loci. 
Those singularities may come up on various occasions. 
They quite often concern the structure of geometrically 
defined solutions of nonlinear equations being crucial to 
define precisely the local and global topological structure 
of solids and their parts, Those singular sets very naturally 
come up, for example, when we are dealing with surface 
intersections that may be related to Boolean operations car- 
ried out for solids bounded by surfaces (cf. to Kriezis, 
Patrikalakis and Wolter, 1992; Patrikalakis and Maekawa, 
2002). Similar problems also cause major difficulties in the 
context of CSG modeling; see Section 5, (cf. Hoffmann, 
1989). Simply spoken, whenever a set under consideration 
has not the structure of a topological or of a differentiable 
manifold, then it will have a singular structure at some 
locations. For an important class of singular sets, this can 
be rephrased by saying singular sets in the Euclidean space 
cannot be represented by solutions of equations correspond- 
ing to some differentiable functions whose differential has 
a maximal rank at all points belonging to the ‘singular 
set’ under consideration. Computations and constructions 
related to the medial axis in Section 6 of this paper often 
contain as their most difficult part computations and repre- 
sentations of the singular subsets of the medial axis (cf. 
Section 5). In general, analyzing and understanding the 
mathematical structure of singular sets is sometimes quite 
difficult and may require the use of sophisticated and fairly 
advanced mathematical methods related to singularity the- 
ory. The mathematical and computational trouble caused by 
mathematically singular sets is enhanced by an additional 
fundamental problem in this context. One of the crucial dif- 
ficulties that we encounter in geometric modeling is caused 
by the fact that all our models are usually represented in 
a discrete space and they only use points on a finite 3-D 
grid having a limited resolution. This implies that even in 
cases where we are dealing with solids, bounded by surfaces 
consisting of triangular facets only, we still may have dif- 
ficulties carrying out Boolean operations. Those difficulties 
are caused by the fact that for certain geometric config- 
urations we cannot properly compute the intersection set 
of two triangular facets. The latter problem may result in 
a (wrong) decision assuming the intersection point to be 
wrongly inside or outside of some triangular facets, In the 
end, all this may contribute to major topological inconsis- 
tencies and contradictions causing a failure of the system. 
In our view, the state of the art in geometric modeling 
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related to all of the aforementioned areas still needs sub- 
stantial improvements by innovative concepts. Those new 
concepts to be developed should benefit from ideas inspired 
by advanced mathematical concepts from computational 
differential geometry and from singularity theory; cf. the 
pioneering work of the late Thom (1975), Bruce and Gib- 
lin (1992), Arnold, Gusein-Zade and Varchenko (1985), and 
Amold (1990). New exciting research by Leymarie (2003) 
uses the medial axis concept (cf. Section 6) in combination 
with ideas resting on a singularity analysis of distance wave 
fronts as to develop new methods for 3-D shape represen- 
tation that are applicable in a context of discrete point sets. 

Another currently very active area related to geometric 
modeling is dealing with data compression. Often huge 
amounts of data points may arise from measurements or 
from construction procedures, for example, when large 
objects are constructed by many patches. Those collections 
of many patches need to be simplified and reduced, that is, 
approximated by a surface whose description needs far less 
data (cf. Bremer et al., 2001). However this approximation 
often must fulfill some specified accuracy requirements, for 
example, concerning placement. Furthermore, quite often 
we must meet some topological conditions such as that the 
approximating surface may not have self intersections and 
singularities. Data corresponding to evaluation of contin- 
uous functions defined on geometric 2-D or 3-D objects 
may be obtained by measurements or by time consum- 
ing computational procedures such as those used in the 
area of differential equations. In all these cases, one may 
encounter extremely large data sets that are far beyond the 
size that can be handled on current computers. In those 
situations, one appreciates good approximation methods 
allowing an efficient approximation of the given data (or 
of that respective function). The description and evalua- 
tion of the approximation should need far less data than 
the original data set. Furthermore it should be possible to 
process the approximation data (substituting the original 
ones) efficiently on the computer for the particular compu- 
tational purpose. This survey paper here has been touching 
the basics of related topics, for example, in the Sections 3.3 
and 3.5. Suffice to say that new concepts of wavelet and 
multiresolution theory appear to provide powerful tools that 
currently drive the progress in the respective fields that may 
be considered to belong to the subject of data compression 
(cf. to Mallat, 1998; Stollnitz, DeRose and Salesin, 1996). 
It should be mentioned that very recent innovative efforts in 
the area of data compression employing new methods from 
a so-called discrete Morse theory benefit from concepts 
that have been developed in the classical areas of mod- 
ern global differential geometry and differential topology 
(cf. Edelsbrunner, Harer and Zomorodian, 2001; Milnor, 
1967). Meanwhile, there even exists a new field called 


‘computational topology’ presenting fundamental research 
for geometric modeling that has been inspired strongly by 
methods and questions and ideas stemming from the classi- 
cal area of topology and differential topology, for example, 
refer to the recent work by Amenta, Peters and Russell 
(2003). 

Historically, geometric modeling has been developed as a 
basic science for Computer Aided Design. In its early days, 
the latter field has been employing descriptive geometry 
and Bezier geometry to design the shape of objects elec- 
tronically instead of using blueprints created in technical 
drawings with the help of compasses and ruler. Meanwhile, 
engineers want computer aided modeling systems whose 
capabilities go far beyond Computer Aided Design. Those 
systems shall not only describe the shape of objects but 
should allow also the simulation of various physical prop- 
erties of the design object. This essentially implies that the 
computer system must be capable to solve partial differen- 
tial equations (PDEs) being defined on the geometry of the 
designed object. For this purpose, we may need systems 
allowing very rapidly (ideally in real time) a good auto- 
mated meshing procedure of the geometric design object. 
The resulting mesh must be appropriate for the approximate 
solution of the respective PDE used to analyze some prop- 
erties of the designed object. Future geometric modeling 
and meshing systems will have to address those important 
needs. Those systems may therefore integrate the design 
and meshing functionalities in combined systems as it has 
been, for example, suggested in our medial modeler system 
described in Section 6.4. In order to handle the combined 
needs of designing shape as well as designing the physics 
of objects, it appears to make sense that the different engi- 
neering communities doing geometric modeling research, 
meshing research, and computational engineering (PDE) 
research will cooperate more closely in the future. This col- 
laboration should initiate learning processes in which each 
community should profit from the knowledge available in 
the other communities. 

Overall, assessing future developments, we think that 
new developments in geometric modeling and also in the 
aforementioned areas will increasingly employ concepts 
and insights from singularity theory, from local and global 
differential geometry, and from advanced (singular) wavelet 
theory. The latter areas will help to provide mathematical 
concepts and tools being relevant to analyze and compute 
delicate singularities that may, for example, be encountered 
analyzing dynamical processes related to various types of 
PDEs defined in the context of a physical analysis of the 
design object. 

Finally we present a remark that corroborates our state- 
ment that new developments in geometric modeling and 
related fields benefit from using synergetically advanced 
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concepts from global and local differential geometry. Geo- 
metric modeling is primarily involved with shape construc- 
tion but it is dealing also with the area of shape interrogation 
and by that geometric modeling is related to shape cogni- 
tion of 3-D and 2-D objects. Shape cognition is concerned 
with methods identifying automatically the shape of an 
object in order to check if the shape design is already 
in a database containing shape design models being, for 
example, protected by some copyright. We want to point 
out that recent advances on new strong methods concern- 
ing shape cognition benefit also from advanced concepts of 
global and local differential geometry such as singularities 
of principal curvature lines called umbilics (cf. Maekawa, 
Wolter and Patrikalakis, 1996; Ko et al., 2003a; Ko et al., 
2003p). 
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NOTES 


[1] Topologists call f[D] the image set of f and R? the 
range of the map. Analysts often call f(D) the range 
of the map and do not introduce a special name tor the 
right-hand side of f. 

This consideration shows that any radius function being 
a continuous function on a compact subset A of B, (0) 
is restriction of some continuous function defined on 
the set B,(0) D A. 


(2 
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1 INTRODUCTION 


To carry out a finite element analysis (FEA or FEM), or 
any type of analysis such as a boundary element method 
(BEM) or a finite volume method (FVM), which requires 
the use of a spatial decomposition of the domain of inter- 
est, it is necessary to construct an appropriate mesh of 
the corresponding computational domain. This is the first 
step we face when using such methods. There are var- 
ious automatic mesh-generation methods that are widely 
used in software packages (Frey and George, 2000). Nev- 
ertheless, these methods are subject to rapid changes 
and improvements, and the demand in terms of meshing 
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facilities is also constantly changing. At present, developing 
quality meshes adapted to the physics of the problem to be 
solved represents a field of intensive research (Thompson 
et al., 1999), 

Meshes can be categorized as being structured or unstruc- 
tured. Structured meshes follow a simple structure like a 
grid or a finite difference network at the vertex connectiv- 
ity level. Construction methods for such meshes are purely 
algebraic (Cook, 1974), of the multiblock type (Allwright, 
1988) or based on appropriate partial differential equa- 
tions (PDE) solutions (Thompson et al., 1985). Geometries 
that can be successfully handled are more or less close 
to quadrilateral (hexahedral) regions or decomposed by 
means of such simply shaped regions. Arbitrarily shaped 
domains are more tedious to consider in this way. There- 
fore, unstructured meshes appear to be the solution. Con- 
struction methods, in this case, fall into three categories: 
(based on) hierarchical spatial decompositions (quadtree, 
octree) (Yerry and Shephard, 1984), (on) advancing-front 
strategies (Van Phai, 1982; Lo, 1985; Löhner, 1997) and 
(by means of) Delaunay-type methods (Weatherill and Has- 
san, 1994; Marcum and Weatherill, 1995; George and 
Borouchaki, 1998). 

Hierarchical decomposition (or tree-based) methods start 
with a unique parent (or root), a cell enclosing the entire 
domain, and they split this single cell into four (eight) sim- 
ilar (or congruent) subcells according to a given criterion. 
A tree is thus formed and analyzed before being recursively 
subdivided (according to the above criterion), until certain 
properties hold. The resulting tree then allows the mesh ele- 
ments to be defined, and the exterior elements are removed. 
Advancing-front-based methods use the discretization of the 
domain boundaries as an initial front. Each edge (triangu- 
lar facet) in this front is then used to construct a triangle 
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(a tetrahedron). A new front is then defined and dealt with 
in the same way. The mesh is completed when the current 
front is empty. It is worth remarking that this approach 
allows for quadrilateral mesh construction (in two dimen- 
sions), but is rather tedious to extend to three dimensions 
(for hexahedral mesh construction). Delaunay-type meth- 
ods make use of Delaunay triangulation algorithms. Such 
an algorithm results in the insertion of a point in a given 
mesh and completes a mesh of the convex hull of the set 
of points formed by the vertices of the domain boundary 
discretization (the input data). This material can be used 
as a point-to-point connector for mesh construction. Once 
the boundary points have been inserted, the boundary dis- 
cretization can be obtained (a rather tedious task, at least 
in three dimensions, whereas in two dimensions, a series of 
edge flips allows a solution), and points are created inside 
the domain before being connected in turn. The final mesh 
is obtained when the domain is saturated (in the sense that 
it is no longer necessary to add new points). 

The above discussion concerns the so-called classical 
mesh-generation problem. Given a discretized boundary, 
we complete at best, a mesh in the corresponding domain 
composed of elements that are judged to be reasonable in 
terms of size and quality (shape). The size is related to the 
greater or lesser thickness of the boundary discretization 
and remains to be properly defined inside the domain. The 
quality is related to the element aspect ratios. Moreover, 
the way in which these two parameters vary in a given 
region is an important issue. It could be observed that 
these criteria greatly depend on the way in which the 
field points are created, such an issue being somewhat 
tedious. Turning to mesh adaptation involves looking at the 
same problem while adding a series of constraints about 
element size and element directions (anisotropic case), 
(Peraire et al., 1987, 1992; D’ Azevedo and Simpson, 1991; 
Vallet, 1992). The aim is no longer to obtain the best 
possible mesh but to complete elements of a prescribed 
size and direction where necessary. Tree-based methods 
are unlikely to be suitable in this respect (in particular for 
handling nonuniform directional constraints). Advancing- 
front or Delaunay-type methods promise greater flexibility 
and so are more likely to be suitable. 

This chapter is made up of 9 sections. Section 1 pro- 
vides a general introduction to the various problems in 
hand. Section 2 gives a brief history about mesh-generation 
methods. Section 3 gives an overview of the most popu- 
lar mesh-generation methods, with a special emphasis on 
automated methods. Section 4 is the core of the chapter 
and contains a number of issues related to adaptation meth- 
ods. We focus on a Delaunay-type method and propose a 
general formulation for a mesh-generation problem. More 
precisely, we introduce the notion of a regular mesh. We 


then demonstrate how this framework allows a formula- 
tion of the mesh-generation problem for adapted meshes. 
Then, we give a construction method based on these ideas. 
The parametric together with the discrete curve and sur- 
face cases are discussed in separate subsections. Finally, 
the volume-meshing aspect is discussed. Section 5 estab- 
lishes the relationship between adapted mesh-generation 
methods and adaptive FEM (finite element method) sim- 
ulations, after which, large-size mesh creation is discussed, 
leading to parallel meshing (Section 6). Moving boundary 
meshing problems are addressed in Section 7. To demon- 
strate how the previous approaches work, Section 8 shows 
a series of application examples in different engineering 
disciplines, together with other more or less derived appli- 
cations. Finally, Section 9 gives some conclusions and 
mentions what the future may hold. 


2 A BRIEF HISTORY 


Without fear of contradiction, it can be claimed that the 
finite element method (FEM) was pioneered by engineers 
and practitioners in the early fifties (see earlier editions 
of Zienkiewicz, 1977 for instance). Then, during the six- 
ties, mathematicians established the theoretical and math- 
ematical foundations of this method (see Ciarlet, 1991 
or Hughes, 1998 among many other references). The FEM 
then became widely used by various categories of people. 
A number of applications in different fields of engineering 
motivated the development of mesh-generation methods. 
Except when considering a unit square region or simply 
shaped geometries where the mesh generation is straight- 
forward, concrete applications in arbitrary domains required 
designing and implementing automated mesh-generation 
methods. A pioneering work by George (1971), in the early 
seventies, demonstrated an advancing-front method for two- 
dimensional geometries. Quadtree-based mesh-generation 
methods were initiated by Shephard (Yerry and Shephard, 
1983) a few years later. Delaunay-based mesh-generation 
methods were introduced by various authors (Hermeline, 
1980; Watson, 1981). 

As for three dimensions, computer facilities available 
from the eighties (including memory capacity and CPU 
efficiency), together with the need for more realistic 
simulations, triggered the investigation of mesh-generation 
methods capable of constructing three-dimensional meshes. 
In this respect, advancing-front, octree-based and Delaunay- 
type methods were extended to this case, while surface 
meshing received particular attention. First references 
include Hermeline (1980), Watson (1981), Yerry and 
Shephard (1984), Löhner and Parikh (1988), Joe (1991), 
Weatherill and Hassan (1994), and Marcum and Weatherill 


(1995). Nowadays, intensive research and advanced 
developments are the topics of a number of groups 
throughout the engineering community. 

Current authoritative literature about mesh-generation 
methods includes a number of monographs (Thompson 
etal., 1985; George, 1991; Knupp and Steinberg, 1993; 
Carey, 1997; George and Borouchaki, 1998; Frey and 
George, 2000), a handbook (Thompson etal, 1999), 
together with papers in ‘specialized’ annual conferences, 
and various survey papers. 


3 MESH-GENERATION METHODS 


Specific geometries can be handled by means of specific 
methods. Domains of a peculiar shape can be dealt with, 
taking advantage of their specificities. Convex domains 
(with no hole) more or less close to a (deformed) quadri- 
lateral (hexahedron) can be dealt with using algebraic 
methods (Cook, 1974) or a PDE-based method (Thompson 
et al., 1985). Arbitrary domains can be decomposed using a 
number of such simply shaped regions and then, multiblock 
methods allow for a solution (Allwright, 1988). Neverthe- 
less, splitting a domain in this way is a rather tedious task 
and thus one that calls for fully automated methods that fall 
into three categories. The first type makes use of a hierar- 
chical spatial decomposition that results in constructing a 
tree, which, in turn, allows the mesh elements to be cre- 
ated. The second type of method can be seen as a greedy 
algorithm, where a piece of the domain that has not yet 
been meshed is covered by a mesh element, thus resulting 
in a ‘smaller’ void region to be considered. Methods of the 
third type involve Delaunay triangulation algorithms revis- 
ited so as to produce meshes (and not only triangulation). 
The following gives a brief description of these three types 
of methods. 


Planar and volume domains 

Before entering into this description, we give an indication 
of what we term as a mesh-generation problem. We are 
given a domain Q in R? or R°. This domain is defined by 
its boundary, F, which, in turn, is known by means of a 
discretization. The latter is a collection of line segments 
in two dimensions, these segments defining a polygonal 
approximation of F. In three dimensions, the boundary is 
defined by means of a triangulated surface (a list of triangles 
or quadrilaterals) which, again, defines an approximation of 
I’. The problem is then, using this sole data, to construct 
an appropriate mesh of Q (actually, an approximation of 
Q). Such a context defines a so-called classical mesh- 
generation problem in which the goal is to recover Q by 
quality elements where the notion of quality only refers 
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to aspect ratio (or element shape), mesh gradation but does 
not explicitly include any information about element sizing. 
Specifying sizing or directional requirements leads to a 
so-called adapted or controlled mesh-generation problem 
where the targeted elements, as above, must be well shaped, 
nicely graded and, in addition, must conform to a given 
sizing or directional specification (referred to as a metric 
in the following). The description below mainly concerns 
a classical mesh-generation problem, while a large part of 
the remaining sections discuss the other mesh-generation 
issue (i.e. how to complete an adapted mesh and therefore 
access adaptive computations). 


Surface domains 

Surface domains are different from planar or volume 
domains in the sense that a surface mesh must conform 
to the geometry of the surface in hand. In other words, 
smoothness, quality, and gradation are demanded, but it 
is mandatory to match the geometry. This leads to paying 
particular attention to the surface curvatures, thus making 
a purely two-dimensional approach unlikely to be suitable. 
In short, there are two ways to define a surface. A para- 
metric definition involves defining a parametric space (thus 
in two dimensions), together with an appropriate mapping 
function. A discrete definition allows the geometry to be 
described by means of a mesh which is, a priori, a geomet- 
ric mesh, and not a computational mesh. To some extent, 
parametric surfaces can be meshed using a two-dimensional 
method, provided information about the geometry is used 
to govern the method. On the other hand, discrete surfaces 
must be considered using the geometric properties of the 
surface directly. 


3.1 Quadtree—octree based methods 


Quadtree (octree)-based mesh-generation methods are me- 
thods whereby the domain is covered by elements using 
a recursive subdivision scheme based on a Spatial tree 
structure. A given discretization (a surface mesh) of the 
boundary of the domain is first enclosed in a square (a 
cube). This initial box (the root of the tree structure) is 
then recursively subdivided into four (eight) similar sub- 
boxes, until a certain consistency is attained between the 
boxes and the boundary items. This recursive scheme can 
also be envisaged as a tree-decomposition procedure. Once 
the decomposition has been achieved, mesh elements are 
generated by subdividing tree cells in a conforming way 
(using predefined patterns). An optimization stage is usually 
required as a final step as the intersections between tree 
cells and the boundary items may lead to the creation of 
ugly-shaped elements. 
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Tree-based methods are very versatile and are able to 
handle complex geometries. However, unlike advancing- 
front or Delaunay-based methods, the original boundary 
discretization is usually not preserved in the final mesh of 
the domain. A variant of this approach consists in consid- 
ering that the domain boundary is known via a geometric 
modeling system (no input boundary discretization is sup- 
plied). The insertion of boundary items in the tree structure 
is then performed through geometric queries. 


Tree construction 

The general scheme for constructing a spatial decompo- 
sition of a computational domain was originally proposed 
by Yerry and Shephard (1983). It consists in two successive 
steps: (i) the construction of a spatial covering up from a 
bounding box of the domain and (ii) the creation of internal 
vertices and mesh elements, It can be seen as an incremen- 
tal method that results in inserting a boundary item into a 
current spatial decomposition. Assuming that the domain 
boundaries are known via a discretization, an initial cel] is 
created corresponding to the bounding box of the domain. 
All boundary items are inserted into the cells of the current 
decomposition with respect to their dimension (from points 
to faces). 

Let A be the current tree structure after the insertion 
of n boundary entities and let E be the next entity to be 
inserted. The cell C containing E is identified (here the 
tree structure proves useful for searching purposes). If C 
is empty (no other entity has been associated with it), then 
E is associated with C, if not, cell C is subdivided into 
equally sized subcells (four in two dimensions and eight 
in three dimensions) and the process is repeated into these 
subcells. At completion, a tree structure is defined, in which 
the depth (the maximum level of refinement) corresponds 
to the minimal distance between boundary items. In order 
to simplify the creation of elements, this tree structure is 
balanced so as to reduce the size ratio between adjacent 
cells to a factor of two at most. 


Mesh-element construction 

The creation of internal mesh vertices and mesh elements is 
straightforward. AN internal cell corners are considered as 
mesh vertices. The tree-balancing rule leads to a substantial 
reduction in the number of possible configurations for a 
given cell, as compared with its neighbors. Therefore, a 
predefined pattern is associated with each internal cell 
configuration in order to provide a conforming mesh of 
the cell (16 such templates exist in two dimensions and 
78 in three dimensions). The boundary cells (intersected 
by the boundary discretization) can be treated in a similar 
way. The mesh of the bounding box of the domain is then 
obtained by merging all the elements created at the cell 


level. To obtain the final mesh of the domain, a coloring 
procedure is used to remove external elements. Obviously 
a mesh-optimization procedure must be applied to get rid 
of the badly shaped elements that may have been created 
because of cell/surface intersections, and to improve the 
overall mesh quality. 


Surface meshing versus octree methods 

A variant of the octree decomposition consists in building 
a surface mesh from an analytical surface representation, 
usually known via a geometric modeler. The depth of the 
tree can indeed be related to edge length k and to the size 
b of the bounding box by the relation p = log,(b/h). As 
edge size h is proportional to the minimum of the principal 
radii of curvature, denoted by p, a lower bound for the tree 


depth can be found: 
> lo (2) 
Pp = 1082 ap 


where a is related to the geometric approximation. The 
local intrinsic properties of the surface (principal radii of 
curvature, vertex normals, etc.) can be used to construct the 
tree. In practice, the cells are refined until the geometric 
approximation criterion is satisfied. 


3.2 Advancing-front methods 


Advancing-front methods take the given boundary dis- 
cretization (mesh) and create the mesh elements within the 
domain, advancing in from the boundary until the entire 
domain has been covered with elements. Such an element 
is constructed on the basis of one entity (edge or triangle) 
of a so-called front (initiated by the boundary items), which 
is connected with a node appropriately chosen among the 
existing nodes or created according to some quality criteria. 
Once an element has been formed, the front is updated and 
the process is repeated until the current front is empty. 

Advancing-front methods were primarily developed to 
handle planar and volume domains but can be used to 
construct surface meshes. 


Point creation, point connection 

Formally speaking, the advancing-front procedure is an 
iterative procedure that attempts to fill the as-yet unmeshed 
region of the domain with elements (George, 1971). At each 
step, a front entity is selected, and a new (optimal) vertex 
is created and inserted in the current mesh, if it forms 
a new well-shaped element. The boundary discretization 
should be orientable. The mesh elements are created on the 
basis of entities (edges or faces) of the current front (the 
initial front being the boundary discretization). Central to 


the advancing-front technique, the creation and insertion 
of an optimal point from a selected front entity requires 
some care. At first, an optimal point is computed (resulting 
in the creation of an optimal element). This point is then 
checked against neighboring vertices in the current mesh. 
A candidate point and the corresponding virtual element 
is analyzed to see whether it intersects the existing mesh 
elements or front entities. A candidate element is a valid 
element if none of its edges (faces in three dimensions) 
intersect any front entity and if it does not contain any 
mesh entity (for instance a vertex or an element). Once 
a candidate point has been retained, it is inserted into the 
mesh, the relevant mesh element is created, and the front 
is updated. At completion, a mesh-optimization procedure 
is applied to locally improve the element shape quality. 
Notice that in three dimensions, some nasty configura- 
tions may occur, which prevent the creation of a valid mesh. 
In such cases, it can be useful to remove the last created 
elements and to restart the procedure while changing some 
parameters to avoid encountering the same problem again. 


Surface meshing versus advancing-front methods 

The same concept can be applied to create surface meshes. 
The main difference lies in the iterative algorithm used to 
find an optimal point location, given a front edge. Given 
an edge AB, the optimal point P is computed so as to 
construct an optimal triangle on the surface. The third point 
P is determined using an angle a between the stretching 
direction and the tangent at the midpoint M of AB (or using 
an average tangent plane and setting a to zero). This point 
does not necessarily belong to the surface as it has been 
created in a local tangent plane. If the surface is known via 
a geometric modeler, a query can provide the closest point 
from P onto the surface. Candidate points are identified 
as those lying in the disk of center at P and radius k x 8, 
where k is a positive coefficient and § denotes the radius 
of a region nearby P. The size of the triangle is locally 
adapted to the surface curvature, that is, it is proportional 
to the minimal radius of curvature. 


3.3 Delaunay-type methods 


Many people find Delaunay-type methods very appealing, 
as the keyword Delaunay has a touch of elegance, based 
on various theoretical issues (see Preparata and Shamos, 
1985 and Boissonnat and Yvinec, 1997) about properties 
of such triangulations. Despite this, a more subtle analysis 
and actual experience both indicate that those theoretical 
issues are unlikely to be usable in the context of creating FE 
meshes. Nevertheless, Delaunay triangulation algorithms 
can be revisited so as to be included in the ingredients 
of the so-called Delaunay mesh-generation methods. 
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Delaunay triangulation 

While alternative methods exist, a popular and straight- 
forward method for generating a Delaunay triangulation 
of a given set of points in R? or R? is the so-called 
Bowyer—Watson algorithm, also developed by Hermeline 
at the same time (1981). It is an incremental method that 
results in inserting one point in a given Delaunay triangu- 
Jation. 

Let Tbe the Delaunay triangulation of the set of n points 
and let P be a point to be inserted; under some realis- 
tic assumptions, this method simply reads T= T— C(P) + 
B(P), where C(P) is the cavity associated with point P and 
B(P) is the remeshing of C(P) based on P. Cavity C(P) 
is the set of simplices in T whose open circumdisk (cir- 
cumball in three dimensions) contains point P, while ball 
B(P) is simply the simplices constructed by connecting P 
with the boundary edges (or triangles) of C(P). Theoret- 
ical issues show that provided the former T is Delaunay, 
the resulting 7 with P as a vertex is also Delaunay. This 
method allows planar and volume triangulation to be gen- 
erated (indeed it can readily be applied to any number of 
dimensions), while it is meaningless for surface triangula- 
tion (of a general variety). 

Variations of this incremental algorithm include con- 
strained versions, where the constraints are of a topological 
nature (e.g. specified edges (or triangle facets) are main- 
tained through the ptocess) or of a metric nature (e.g. 
additional properties such as element quality control are 
included in the construction), Also, non-Delaunay triangu- 
lations can be successfully carried out provided an ade- 
quate construction of C(P) and anisotropic cases can be 
addressed. 


Delaunay-based meshing algorithms 

The above-mentioned constrained incremental method can 
be used as part of a meshing algorithm (thus referred to as 
Delaunay-based). Let Q be the domain to be meshed and 
let I be a discretization of its boundary. The set of points 
in this boundary discretization is triangulated using the 
above incremental method (after being included in a convex 
bounding box). The resulting triangulation is a triangulation 
of the introduced box where, in general, extracting a mesh 
of & is unlikely to be possible. This is due to the fact 
that the boundary entities are not necessarily edges or 
facets of this triangulation. In other words, inserting the 
two endpoints of a boundary edge (the three vertices of a 
boundary triangle) may result in a triangulation where this 
edge (triangle) does not exist. 

Various methods have been developed to allow for the 
regeneration of such missing entities. In two dimensions, 
edge-swapping operators result in what is needed, while in 
three dimensions, the same is rather tedious. Nevertheless, 
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those methods readily work and result in a triangulation of 
the box where all the boundary entities are established. As 
a consequence, a mesh of & can be obtained by suppressing 
those elements in the box outside the domain. 

At this time, we have in hand a mesh of Q whose vertices 
are, roughly speaking, the boundary vertices. Such a mesh is 
unlikely to be suitable for FE computations. Therefore, field 
points must be created before being inserted. Methods to 
create the necessary field points include using the centroid 
or the circumcenter of some mesh elements on the basis of 
appropriate criteria (such as density requirement, element 
quality concern, etc.), Other methods make use of a tree 
structure to define those field points or introduce points 
along the edge elements (for any dimensions) or, again, 
use an advancing-front strategy to locate those points. 
At completion, for example, when the mesh is saturated 
(according to the selected criteria), the resulting mesh is 
optimized (with regard to quality measures) by means of 
edge or facet swapping, node repositioning, and so on, as a 
Delaunay triangulation is not, in general, and particularly in 
three dimensions, a quality triangulation (for FE purposes). 


Surface meshing versus Delaunay-type methods 

The notion of a Delaunay surface mesh is in some ways 
confusing. The Delaunay criterion, the key to Delaunay 
triangulation methods, indicates that the circumdisk (cir- 
cumball in three dimensions) of the mesh elements is 
empty. For a surface, this notion is meaningless for a num- 
ber of reasons. First, such disks are not defined; second, 
the Delaunay criterion (that correlates to a proximity and 
visibility criterion) does not include any concern about cur- 
vatures (and thus directions). Therefore, Delaunay meshing 
of a surface is unlikely to be suitable. Nevertheless, para- 
metric surfaces can be meshed by means of a Delaunay 
method, where the Delaunay property applies in the para- 
metric space (thus a planar region) and is necessarily an 
anisotropic version of the method (to handle the curvature 
of the true surface), as discussed in the sequel. 

To be more precise, while the notion of a Delaunay sur- 
face mesh is confusing, the notion of Delaunay-conforming 
(or admissible) surface meshes is well founded. Such a 
mesh enjoys the following property: inserting, by means 
of a three-dimensional Delaunay algorithm, the endpoints 
of the given surface triangles results in a volume mesh (thus 
a tetrahedral mesh) where all the triangles in this surface 
mesh are facets of elements in the volume mesh. 


3.4 Combined methods 


The three main classes of automated mesh-generation meth- 
ods have both advantages and drawbacks. Combining one 


or more of these methods is therefore an elegant way of ben- 
efitting from the advantages of such a method while avoid- 
ing any possible drawbacks. In this respect, Delaunay-type 
methods have been combined with octree or advancing- 
front techniques and, conversely, octree or advancing-front 
techniques have been combined with Delaunay methods. 


4 QUALITY MESHING AND 
ADAPTIVITY 


A quality mesh, or more precisely a mesh adapted to 
given quality and density requirements, is introduced as 
an occurrence of a more general class of meshes. We first 
introduce the notion of a regular mesh and we show how 
this allows for the definition of an adapted mesh. 


4.1 Regular mesh 


Let Q be a closed bounded domain in R? or R? defined by 
its boundary F. A quality simplicial mesh or a regular mesh 
of & is a mesh whose elements are equilateral (regular). 
The existence of such a mesh is not guaranteed in general. 
Indeed, it depends, to some degree, on the domain boundary 
discretization. Therefore, we will call a simplicial regular 
mesh the ‘best’ simplicial mesh that can be completed. As 
the issue of constructing a regular mesh for an arbitrary 
domain is an open problem, there exist various methods 
that allow the construction of ‘almost’ regular meshes. 

In a classical context, two types of boundary discretiza- 
tions can be envisaged. The first case concerns uniform 
discretizations where a constant step size is given. The main 
advantage of such a discretization is that, in principle, it is 
possible to complete a regular mesh. Nevertheless, this does 
not guarantee a good approximation of the domain bound- 
aries for a given step size. Given a uniform discretization 
of a domain boundary, a regular mesh is nothing more 
than a mesh where the element sizes are ‘equal’ to the 
step size serving at the boundary discretization. Thus, the 
desired size for the elements in the mesh is known a priori 
at each mesh vertex in the regular domain mesh. Let us 
consider the case where the domain boundary is composed 
of several connected components and where these are dis- 
cretized by means of different step sizes (as is the case for 
the domains encountered in computational fluid dynamics 
(CFD)). A regular mesh of such a domain is a mesh where 
the element sizes in the neighborhood of each component 
are close to the step size of this component. As for the ele- 
ment sizes elsewhere in the domain, they must be close to 
the step sizes of the discretization of the boundaries situated 
in some neighborhood. 


The second type of discretization concerns the so-called 
‘geometric’ discretizations that are adapted to the boundary 
geometries. In this case, it may be proved that the discretiza- 
tion step size must be locally proportional to the minimum 
radius of curvature of the boundary. The drawback of this 
type of discretization is that a rather wide variation in the 
discretization step size may result. In order to avoid this 
phenomenon, a smoothing technique on the discretization 
step size may be applied. For a geometric discretization (not 
uniform in general) of the domain boundaries, it is tedious 
to find a priori what element sizes make the mesh regular. 
Obviously, a regular mesh is one where the element sizes 
are almost constant (or vary just a little). The idea is then 
to find among all the continuous size functions the function 
that leads to a minimal variation. This notion of a minimal 
variation is a characterization, among others, of the surfaces 
defined by means of harmonic functions. 


4.2 From regular mesh to adaptivity 


The two above types of meshes appear to be a particular 
occurrence of a more general mesh-generation problem that 
involves constructing a mesh whose element sizes conform 
to some given specifications. These requests define, in 
each point in the domain, the desired element sizes in all 
directions. There are two types of specifications: isotropic 
and anisotropic. In the first case, the size remains constant 
in all directions, which is the case we come across in mesh- 
adaptation problems (and thus the classical cases fall into 
this category). As for the second case, the size may vary 
when the direction varies. This is used for problems where 
the solution shows large variations in some directions. 
In both cases, we assume that we are given a function 
h(P, d) > 0 defining the size h at point P in the domain 
following the direction d, and the problem comes down to 
completing a mesh where the edge lengths conform to this 
function (or size map) h. Written in this way, the problem 
is not well posed. Indeed, if PQ stands for an edge in 
the desired mesh, the Euclidean length ||P Q|| of PQ must 
Satisfy, at the same time, the two antagonist relations 
IPQ =h(P, PQ) and IPO] = ACQ, OP) 
which implies that the vertices P and Q are constructed in 
such a way as 


h(P, PQ) = h(O, OP) 


This is unlikely to be possible if the size map is such that 
VP,Q h(P, PQ) #h(Q, QP). In fact, the computation 
of the Euclidean length of PO does not take into account 
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the variation in the map h. Therefore, the mesh-construction 
problem must be formulated in a different way. To this end, 
we introduce a new definition. A mesh conforms to a size 
map h if all of its edges have an ‘average’ length equal to 
the average of the sizes specified along these edges. Thus, 
it is necessary to define this notion of average length with 
regard to a size map. To do this, we assume that the data 
of the map A(X, d) (VX, d) allows for the local definition 
of a metric tensor (or a metric, in short) M(X) at X, 
which in turn defines a new norm that takes into account 
the size variation related to the directions. Let / MO (e) be 
the length of edge e computed using metric M(X). The 
average length of edge e may be defined as the average 
of the lengths Zuçx)(e) when X moves along e. If l, (e) 
denotes this length (subscript ,, for mean), we have 


fans 
lp (e) = +X t 


T 


for e = PQ and X = P +t PỌ, we obtain 


) 
MPO = f ineno PO at 


If h,,(PQ) stands for the average of the sizes along PQ, 
we have 


1 
h, (PQ) = f h(P +tPQ, PQ) dt 


and edge PQ conforms to map h if lp (P Q) = h,,(PQ). To 
avoid computing k„ (PQ), we redefine metric M in such a 
way that h,,(PQ) = 1 holds, which implies (in general) 
A(X, d) =1 (VX,d) and, in this case, we simply have 
L,(PQ) = 1. In what follows, we give some remarks about 
the construction of metric M(X) starting from the size map 
h(X, d) (YX, d). The key is to find a metric M(X) that 
conforms as well as possible to the map. Let us recall 
that a metric M(X) defined at point X is the data of a 
symmetric positive-definite matrix also denoted by M(X). 
The geometric locus of the points Y that conform to metric 
M(X} at point X is in general an ellipsoid E(X} whose 
equation can be written as 


XY M(X)X¥ =1 
This particular expression of the metric prescribes a desired 
size, which is unity in this metric. Indeed, for each point Y 


in E(X), we have 


Dua (XY) = (XY, MOX) = 1 
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The set of points in E(X) is termed as the unit sphere asso- 
ciated with metric M(X). If C(X) denotes the geometric 
locus of the points conforming to the size map h(X, å 
(vd) at point X, then metric M(X) may be characterized 
as the one whose unit sphere €(X) has a maximal volume 
included in C(X). Such a metric is called the underlying 
metric. In the particular case where the size map A(X, d) 
only depends on X (isotropic case), metric M(X) reduces 
to 


1 

M(X) = Bx) 
where J, is the identity matrix in Rf (d = 2 or 3). In 
this case, €£(X) and C(X) are the sphere centered in X 
whose radius is h(X). When C(X) is an ellipsoid (clas- 
sical anisotropic case), metric M(X) is obviously the one 
whose associated sphere E(X) is similar to C(X). In the 
general case where C(X) is arbitrary, we may make use 
of optimization algorithms to find £(X) and thus to have 
metric M(X). a 

The size map A(X, d) (YX, d) is then seen as the metric 
map M(X), and a mesh conforming to this metric is a mesh 
where the edges have an average length unit. A mesh with 
this property is said to be a unit mesh. One could observe 
that this average edge length in a metric is nothing other 
than an edge length if we associate a Riemannian structure 
with the domain, this structure being defined by the metric 
map M(X). In this structure, length L,, of edge PQ is 
given by 


£ 
L m (PQ) =f V'PÒM(P +1PO)PO dt 


To summarize, a mesh is said to be conformal to a given 
size map A(X, d) (YX, d) if it is unity in the Riemannian 
structure associated with the underlying metric map. 

From a practical point of view, the Riemannian structure 
may be defined in two ways: as a continuous or a discrete 
structure. The first way consists in defining the metric map 
M analytically and, in this case, the metric map M is 
explicitly given as a function of the position. The second 
way consists in defining the map by means of interpolation 
from the data of the map at the vertices of a mesh, 
termed the background mesh. This approach is popular in 
adaptive schemes where the metric map is computed in 
a supporting mesh using an appropriate error estimator. 
Several interpolation schemes can be used (Borouchaki 
et al., 1997). If the map is defined in a continuous manner, 
then the desired unit mesh can be obtained in one iteration 
step. On the other hand, if the map is known in a discrete 
way, several iteration steps (or adaptive meshing steps) may 
be necessary. 


To conclude, the question facing us here is to know 
whether a unit mesh conforming to a metric map is suit- 
able for finite element purposes (for instance, in terms of 
convergence issues). To decide on this point, it is natu- 
ral to add another criterion to qualify what appears to be 
a mesh suitable for computation. This criterion is related 
to the shape of the elements. A unit mesh conforming to 
a metric map may not be suitable for computational pur- 
poses. Indeed, the element shape quality largely depends on 
the size variation present in the metric map. To avoid this, 
it is only necessary to modify the metric map (Borouchaki 
et al., 1997), in accordance with the desired size (while pre- 
serving certain properties included in the map). However, 
this modification in the metric map generally results in a 
larger number of elements, thus leading to a high compu- 
tational cost. One possible compromise is to modify the 
metric map as a function of the ‘sizes’ that are suitable in 
the targeted computational method. 


4.3 Unit meshing of a curve or a surface 


Throughout this section, we propose a method that results 
in the construction of a unit mesh in a domain Q in RÊ, 
d =2 or 3 (the domain being defined by its boundary £), 
equipped with a given Riemannian metric M4. The method 
reduces to meshing Q in such a way that the edges in this 
mesh are unity. Bear in mind that the metric at a point P in 
Q is defined by a symmetric positive-definite d x d matrix, 
M,(P). If P is a vertex in the unit mesh of Q and if PX 
is an edge with P as an endpoint, then one must have 


1 
i} y'P&M,(P +tPX)PX dt =1 
0 


The proposed method involves two steps: the discretiza- 
tion of the boundary £ of Q by means of unit elements and 
the construction of a unit mesh in Q using the above bound- 
ary discretization as input. These two steps are discussed 
in the following sections. Curve discretization includes two 
different cases based on the way the curve is defined: in a 
parametric form or in a discrete form. The same applies for 
the surface where a parametric or a discrete definition can 
be used. 


4.3.1 Unit parametric curve discretization 


We assume © to be defined by a mathematical (analytical) 
model. In two dimensions, the boundary © of 2 is com- 
posed of curved segments T;: I; —— R?,t > y;,(t) 
where J, is a closed interval in R and y,(t) is a continuous 
function of class C?. The problem reduces to the discretiza- 
tion of a generic curved segment T: J ——> R?, t -—> 


y(t). In three dimensions, the boundary © is composed of 
parametric patches Z,:@; ——> R3, (u, v) — o; (u, v), 
where Q; is a closed bounded domain in R? and o;(/) 
is a continuous function of class C?. Similarly, the prob- 
lem reduces to the discretization of a generic parametric 
patch Z: % ——> R3, (u, v) —> o(u, v). In this case, the 
discretization includes two steps: the discretization of the 
boundary of £, which is composed of curved segments 
in R? and that of E starting from the discretization of its 
boundary. Discretizing a curved segment in R? is a partic- 
ular case of the general problem of discretizing a curved 
segment in IR?. In what follows, we describe how to dis- 
cretize such segments, then we show that discretizing a 
parametric patch reduces to constructing a unit mesh of a 
domain in R? in accordance with a metric map induced by 
the intrinsic properties of the patch. 


Discretization of a curved segment in R? 
Let T: [a, b] ——> R3,t —> y(t) and let y(t) be a con- 
tinuous function of class C?. As previously seen, the length 
of I in the Riemannian structure M; is 


LP) = Í JVOO O dt 
FA 


In order to discretize I by means of unit segments, we first 
compute the integer value n closest to L(y) (thus F must be 
subdivided into n segments), then we compute the values 
t,l<i<n-l@®@=a and f, = b) such that 


fil 
LD _ f ty (t) M YEY’ (E) dt 
n ti 


Finally, the discretization of F consists of the straight line 
segments y(t;)y(t+1)- 


4.3.2 Unit discrete curve (re)discretization 


There are essentially two ways to define a curve: by a 
continuous function as explained before or, in a discrete 
manner, by an ordered set of sampling points. These points 
can, for instance, be generated by a CAD system or by a 
scanning device, or they can be the result of a numerical 
simulation in an adaptive scheme. Generally, the goal is 
then to obtain from this set of given points a parametric 
curve that should be as smooth as possible. 

In some cases, the data may be ‘noisy’ because of mea- 
surement or computation errors, producing an interpolating 
curve with a rough aspect. Smoothing techniques, includ- 
ing various averaging schemes, can be applied to avoid this 
phenomenon. 

At present, given a set of points {P;};—o,..n» the problem 
is to find a continuous function y defined on R with 
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values in R? (in the two- or three-dimensional space), and 
increasing real numbers t; such that y(t;) = P, for each i = 
0,...,7. In practice, the solution can be chosen amongst 
piecewise polynomial functions called splines, by analogy 
to the draftman’s tool consisting of a thin flexible rod made 
of metal, plastic, or wood. Each polynom on an interval 
En, ti +1] is usually of degree 3, being defined by the 
location of the extremities P, and P,,,, as are the tangent 
vectors at these points, thus ensuring a C! continuity of y. 
To define a tangent vector at each point P;, several methods 
arc proposed. In the Catmull-Rom approach, the vector is 
colinear with P,_,P.,, (with particular conditions at both 
ends P) and P, of the curve). In the de Boor approach, a 
C? continuity is imposed at each intermediate point P, and 
a linear system is solved, yielding a smoother aspect to the 
whole curve. Finally, having a geometric support defined 
by the interpolating function y, a new discretization of the 
curve can be obtained using the method described in the 
previous section. 


4.3.3 Unit parametric surface meshing 


Let © be a parametric surface defined by o: A —> 
E, (u, v) > ou, v),A being a domain of R?, and o 
being a continuous function of class C?. We assume that 
A is closed and bounded, as is &. The problem we face 
is to construct a mesh Ty, (£) of © that conforms to the 
metric map M,. The idea is to construct a mesh in the 
parametric domain A, so as to obtain the desired mesh 
after being mapped onto the surface. To this end, we show 
that we can define a metric M, in A such that the relation 
Ty,(2) = o(Ty,,(A)) is satisfied, To do so, first, we recall 
the usual Euclidean length formula of a curved segment of 
R? plotted on ©, and then we extend this notion to the case 
where a Riemannian metric is specified in X. 

Let I be a curved segment of £ defined by a continuous 
function of class C?, y(t) € R°, where £ € [a, b]. The usual 
Euclidean length £7, (T) of T is given by 


b 
LT) = f FOYE) dt 


As T is plotted on X, there is a function œ(ż) € Q, 
where t € [a,b], such that y=o°w. We have yt) = 
o’(w(t))w' (t), where o’((t)) is the 3 x 2 matrix defined 
as 


d'alt) = (01, (@@) o1,(@())) 
Thus, we obtain 


YY = ON (@M)a'@) 
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but we have 


oD = (140%) (ewe oo) 


or 
‘o'(@@))o' (mt) = M, (a(t) 


where M, is a 2 x 2 matrix, which characterizes the local 
intrinsic metric of £ at point y(t) and is defined as 


ACO ACOA ACO) 
Mal = eers e] 


We can deduce that 


b 
Lz,7) =} yo! (t)M, (a) (E) dt 


The above formula has an interesting interpretation. The 
Euclidean length of the curved segment «(r) plotted in 
Q depends on the Euclidean norm of w’(t), while the 
Euclidean length of the curved segment y@) = o(w(t)) 
(image of w(t) on E) depends on the ‘Riemannian norm’ 
of w’(t) with respect to the local intrinsic metric of È. 

In particular (depending on the particular reason for 
generating a mesh), if w(t) is a line segment AB of Q, 


we have w(t) = A+ 1AB; thus œ (t) = AB, and 


i 
L, (o(4B)) = [ \'ABM,(A +1AB)AB at 


The above formula allows us to compute the length of the 
curved segment on £, which is the mapping by o of an 
edge plotted on Q. If the new metric M, is given in Q, we 


have 
I 
Lu (4B = f V'ABM,(A +tA4B)AB dt 
0 


= L;,(0(AB)) 
Let us consider the case where a Riemannian metric M3 


of R? is specified on ©. In this case; the Riemannian length 
Ly, (T) ofT is given by 


b 
Lm) = i VYOM YOY 6) dr 
a 


which can be written as 


b ma 
| yV (Mloo C) dt 


M,(o(t)) = ONM ol) 


with 


Thus, if w(t) is a line segment AB of Q, we obtain 


I — 
Lyg(o(AB)) = [ V'ABM,(A +tAB)AB at 


The above formula allows us to compute the generalized 
length of the curved segment on X, which is the mapping 
by o of an edge plotted on A. Let us define the new metric 
M, in A. We have 


1 = 
Ly,(AB) = [ V'ABM,(A + tAB)AB at 


and thus we obtain 
Ly,(0(AB)) = Ly, (AB) 


To return to the problem of mesh generation itself, the last 
equation shows that the mapping of the mesh conforming 
to the metric M, of A onto the surface gives the mesh 
conforming to the specified metric M, of X. 


4.3.4 Unit discrete surface meshing 


The problem of meshing a discrete surface (a surface 
defined by a triangulation) is closely related to the problem 
of defining a suitable metric map Mg on the surface to 
control mesh modifications. The idea is to extract the 
intrinsic properties of the underlying surface from the initial 
surface triangulation. 

Formally speaking, the problem is to construct a geomet- 
ric surface mesh Zy, from an initial reference mesh Tep- 
At first, mesh Tç is simplified and optimized in accor- 
dance with a Hausdorff distance 8, resulting in a geometric 
reference mesh Zerg- This first stage aims at removing 
extra vertices (i.e. those that do not contribute explicitly 
to the surface definition) and at denoising the original 
data. This procedure consists in removing the mesh ver- 
tices iteratively, provided two conditions are satisfied: an 
approximation criterion (related to the Hausdorff distance 
between the current mesh and the original one) and a reg- 
ularity criterion (related to the local deviation of the mesh 
edges from the local tangent planes). Then, a geometric 
support, piecewise C? continuous, is defined on mesh Ter g 
so as to define a ‘smooth’ representation of the surface X. 
The aim of this support is to supply the location of the 
closest point onto the surface, given a point on a triangular 
facet. This support will be used to insert a new vertex in 
the current mesh. 


Curvature evaluation 

In order to build the metric map M,, we need to evaluate 
the intrinsic properties of the surface. This requires comput- 
ing the principal curvatures and principal directions at the 


vertices of mesh Tier, g To this end, the surface at a point 
P is locally approached by a quadric surface, based on a 
least square fit of adjacent mesh vertices. The local frame 
at P is computed on the basis of the normal to a discrete 
approximation of the normal to the surface at P. To find the 
coefficients of the quadric, we consider all vertices P, of 
the ball of P and assume that the surface fits, at best, these 
points. Solving this system is equivalent of minimizing the 
sum 


m 
min», (ax? +bx;y; + ey? — uy 


which corresponds to minimizing the square of the norm 
of the distance to the quadrix surface. Knowing the coef- 
ficients a, b, and c, it becomes easy to find the local 
curvatures at a point P = (0, 0, 0) in the local frame: 


E=1+(Qau+bv)? =1, L=2a 
F = (2au + bv)(bu + 2cv) = 0, M=b 
G = 1 4 (bu + 2cv)? = 1, N = 2c 


The analysis of the variations of the normal curvature 
function 


_2® 
(2) 


«,@) 


leads to resolving a second-order equation that admits 
(in principle) two distinct solutions, two pairs (\,, K1), 
(àz, K2). The extrema values «, and K, of K, (i.e. the roots 
of the equation) are the principal curvatures of the surface 
at P. Considering the second-order equation 


ee (kK; + K2)k + KaK, =0 


yields 
LN- M? 5 
KK = EGLE = 4ac — b°, 
NE —2MF + LG 
Ky +K? EGF = 2(a+c) (1) 


where K =k,k, is the Gaussian curvature and H = 
1/2(k, + K2) is the mean curvature of the surface at P. 
Solving these equations allows us to find the extrema values 
kK, and K, at P 


Xa +c) + VA 
Ki =a 
9, 
where A = (2(a +c))* — 4(4ac — b?). A similar analysis 


leads to finding the principal directions at a point P on the 
surface. 


Mesh Generation and Mesh Adaptivity 507 


Metric definition 

A geometric metric map M,(P) can be defined at any mesh 
vertex P so as to locally bind the gap between the mesh 
edges and the surface by any given threshold value £. A 
matrix of the form 


1 
a? pt(P) g 
M3(P) p,p = 'D(P) 0 pars 0 DP) 
B205(P) 
0 0 X 


where D(P) corresponds to the principal directions at P, 
py = 1/Kz, P, = 1/K, are the main radii of curvature, & 
and B are appropriate coefficients and à € R provides an 
anisotropic (curvature-based) control of the geometry. This 
discrete metric prescribes mesh sizes as well as element 
stretching direçtions at mesh vertices. The local size is pro- 
portional to the principal radii of curvature, the coefficient 
of proportionality being related to the largest allowable 
deviation gap between the mesh elements and the surface 
geometry (Frey and Borouchaki, 1998). For instance, set- 
ting constant gap values comes down to fixing 


a=2Ve@—e) together with p=2 |s% (2 z e£) 
P2 P2 


As the size may change rapidly from one vertex to 
another, the mesh gradation may not be bounded locally. 
To overcome this problem, size map M, is modified using 
a size-correction procedure (Borouchaki et al., 1997). 


Surface remeshing 

Having defined an adequate metric M, at any mesh vertex, 
the discrete surface-meshing problem consists in construct- 
ing a unit mesh with respect to metric map M3. To this end, 
local mesh modifications are applied on the basis of the 
edge length analysis. The optimization consists in collaps- 
ing the short edges and splitting the large edges on the basis 
of their relative length. Geometric measures are used to con- 
trol the deviation between the mesh elements and the sur- 
face geometry, as well as the element shape quality (Frey 
and Borouchaki, 1998). A point relocation procedure is also 
used in order to improve the element shape quality. 


4.4 Unit volume meshing 


The global scheme for unit mesh generation is well known: 
a coarse mesh (without internal points) of the domain is 
constructed using a classical Delaunay method, and then 
this mesh is enriched by adding the field points before 
being optimized. The field points are defined in an iterative 
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manner. At each iteration step, these points are created 
using a method for edge saturation or an advancing-front 
method. Then, they are inserted in the current mesh using 
the constrained Delaunay kernel (George and Borouchaki, 
1998) in a Riemannian context. This process is repeated 
as long as the current mesh is being modified. On the 
basis of this method, the field points are necessarily well 
located with respect to the mesh entities that have already 
been constructed. Similarly, the Delaunay kernel results 
in almost optimal connections (an optimal connection is 
found when regular elements exist) from point to point. 
In what follows, we give some details about the field 
point placement strategies and we discuss the generalized 
constrained Delaunay kernel. 


4.4.1 Point placement strategies 


Here, we propose two methods that allow the field points to 
be defined. The advantage of the first is its simplicity and 
low cost. On the other hand, the second method generally 
results in better quality meshes. 


Edge saturation 

At each iteration step, the field points are defined in such 
a way as to subdivide the edges in the current mesh by 
unit length segments. A field point is retained if it is not 
too close (i.e. at a distance less than 1) to a point already 
existing in the mesh or previously retained for insertion. 


Advancing-front strategy 

At each iteration step, a set of facets (edges in two dimen- 
sions) in the current mesh, thus a front, is retained and the 
field points are created from this front so as to form unit 
elements (elements with unit edges). A facet in the current 
mesh is considered as a front entity if it separates a unit ele- 
ment from a nonunit element (nu for short). Particular care 
is necessary if one wants to properly cover (meaning that 
the number of points thus defined is adequate) the whole 
domain. For instance, a nonunit element may be seen as a 
unit element at some time. The optimal point related to a 
front facet is defined in the same side as where the asso- 
ciated nonunit element lies. This is the point that results 
in a unit element after being connected to the front entity. 
Let f; be a front facet at iteration step i, which separates 
the unit element K? = (f, P,), where P is the vertex in 
K? other than the endpoints of f, from the nonunit element 
K™ = (fi, P;), where P, is the vertex in K" other than the 
endpoints of f;. The optimal point P* with respect to f, 
is constructed in such a way that element (f,, P¥) is unity. 
If point Př is at a distance less than one from P,, then 
the nonunit element K}" is considered as a unit element in 
the next iteration step i+ 1. This means that the nonunit 


element KP" is constructed and nothing better can be done. 
A variation leads to considering all the nonunit elements 
that have a front facet at iteration step i as unit elements 
at iteration step i + 1. Obviously, these elements will be 
considered (when defining the front at iteration step i + 1) 
if they are not affected by any point insertion at iteration 
Step i. As above, the optimal point for a front facet is cre- 
ated if it falls inside the domain and if it is not too close 
to an existing point. 


4.4.2 Point insertion 


In a classical problem, the constrained Delaunay kernel 
based on cavity remeshing is written as (cf. Watson, 1981) 
T=T-C(P) + B(P), where C(P) is the cavity associated 
with point P and B(P) is the remeshing of C(P) based on 
P (T being the current mesh). The cavity is constructed 
following a constrained proximity criterion given by 


{K,K €7,P € Ball(K) and 


P visible from each vertex in K} 


where Ball(K) is the opened circumball of K. 

An extension of this approach involves redefining 
the cavity C(P) in a Riemannian context (George and 
Borouchaki, 1998). To this end, we first introduce the 
Delaunay measure oy, , associated with pair (P, K), with 
respect to a metric Mg: 


ay (P, K) = | Ox?) 
K Ma 


where Og (resp. rg) is the center (resp. radius) of the 
circumsphere of K and [*],y, indicates that the quantity 
* is measured in the Euclidean space characterized by 
metric M,. The usual proximity criterion, P € Ball (K), 
is expressed by az (P, K) < 1, where Ty is the identity 
metric. Cavity C(P) is then redefined by C(P) = C\(P)U 
C,(P) with 


C,(P) ={K, K € 7, K including P} 
C,(P) = {K, K € T, 3K' €C(P), K adjacent to K’, 
Oma (P, K) +) duv (P, K) <d +2, 
v 
V vertex of K, P visible by the vertices of K} 


Hence, region C(P) is completed by adjacency starting 
from the*elements in C,(P). After this definition, and in 
two dimensions, the generalized cavity is star-shaped with 
respect to P and the B(P) is valid. In three dimensions, 
a possible correction (George and Hermeline, 1992) of the 
cavity based on element removal can ensure this property. 


4.4.3 Optimization processes 


The proposed method results in a unit mesh of domain Q. 
Nevertheless, the mesh quality can be improved by means 
of two optimization processes, one made up of topological 
modifications, the other consisting of geometrical modifi- 
cations. The first mainly involves applying a number of 
facet flips, while the other consists of node repositioning. 
In fact, we assume that the mesh of domain Q (before 
being optimized) has more or less the right number of inter- 
nal vertices. These optimization procedures do not modify 
the number of internal nodes but, in particular, enhance 
the quality of the nonunit elements created when the field 
points have been generated. The scheme consists of itera- 
tive facet flips and node repositioning. In what follows, we 
recall the notion of edge quality and element quality and 
we discuss these optimization tools. 


Edge length quality 
Let AB be a mesh edge. The length quality Q, of AB in 
the Riemannian metric M; may be defined as 


Ly AB) if Ly,(AB) <1 
Q,(AB) = 1 i 
1 LAB) if L (AB) >1 


With this measure, 0 < Q; (AB) < 1 holds and a unit edge 
has a length quality with a value of 1. This quality measure 
about the edge lengths shows how the mesh conforms to 
the specified Riemannian metric Mg. 

The edge length quality of a mesh Tis defined by 


I b 
OT) = (i 2 Q,(e), min 210) 


where e stands for an edge in mesh J and |T] is the number 
of such edges. The two quantities in the formula measure 
respectively the average and the minimum of the length 
qualities of the mesh edges. 


Element shape quality 
Let K be a mesh element. In the classical Euclidean space, 
a popular measure for the shape quality of K is (Lo, 1991) 


V(K) 
P PEK) 


eK) 


Q (K) =c 


where V(K) denotes the volume of K, e(K) being the 
edges in K and c the scaling coefficient such that the quality 
of a regular element has the value of 1. With this definition, 
we have 0 < Q (K ) < 1 and a nicely shaped element has a 
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quality close to 1, while an ill-shaped element has a quality 
close to 0. 
In a Riemannian space, the quality of element K can be 


defined by 

OK) = ee Q;(K) 
where Qi (K) is the element quality in the Euclidean space 
associated with the metric Mi, corresponding to the vertex 
number i in K. 

To measure the quality Q/(K), it is simply necessary to 
transform the Euclidean space related to the metric specified 
at vertex i of K into the usual Euclidean space and to 
consider the quality value of element K’ associated with 
K; in other words 


Q'(K) = O,(K') 
Tt is easy to show that 


a Det(Mi)V (K) 


PRU TAGCS) 


e(K) 


Qi(K) =c 


Similarly, the shape quality of the elements in mesh Tis 
defined by 


1 i 
QD = (h 2, IS; pro) 


where K stands for an element in mesh 7. The two 
quantities in the formula measure respectively the average 
and the minimum shape qualities of the mesh elements. 


Facet flip 
Facet flip only affects the mesh topology. It has proved 
to be very efficient for shape quality improvement. This 
technique results in the removal of a facet of arbitrary 
dimensionality whenever this is possible. Let f be a facet 
of arbitrary dimensionality in the mesh. We use the term 
shell (cf. George and Borouchaki, 1998) for f, the set of 
elements sharing f. Flipping f involves constructing a 
triangulation of the hull of the shell of f, where f is not 
a mesh entity. The quality of a shell is that of its worst 
element. The flip is then processed if the quality of the 
new triangulation is better than that of the initial shell. 
When a Riemannian metric must be followed, it is 
necessary to sort these facet flips, while this is not strictly 
necessary in a classical Euclidean case. This leads to the 
association of the expected ratio of improvement B, with 
face f by emulating a flip. Then, to optimize the mesh, 
an iterative process is used, which applies the flips in the 
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decreasing order of the above ratios. To begin with, the 
ratio of improvement is set at a given value > 1, then 
w is modified and decreases to 1. Such a strategy leads to 
flipping in first the most significative operations in terms of 
mesh improvement. 


Point repositioning 
Let P be an internal point in the mesh and let (K;) be the 
set of elements with P as a vertex (i.e. the ball associated 
with P (cf. George and Borouchaki, 1998)). Repositioning 
P consists in moving P so as to enhance the quality of 
the worst elements in (K;). Two methods can be advocated 
for node repositioning. One based on unit length, the other 
on optimal elements. The first method improves the edge 
length quality for the elements in (K;), while the second 
improves the shape of these elements. In practice, both 
methods are applied to all the internal points of the mesh. 
Let (P;) be the set of vertices in (K;) other than P. With 
each point P, is associated the optimal point (P;*) such that 


% 1 os 
P.P* = ——_—— P,P 
ivi Lp (PiP) i 


for which Lp (P; P*) = 1 holds. Repositioning P consists 
of moving point P step by step toward the centroid of the 
points (P**) if the quality of the worst element in (K,) is 
improved. This process results in unit edge lengths for the 
edges that have P as one endpoint. 

Let (f;) be the facets in the elements in (K;), which are 
opposite vertex P, that is, (K; =[P, fD. With each facet 
f, the optimal point P; is associated such that the element 
Kj = [P*, fi] satisfies 


0,(K}) = max Q (LP, fi) 


where P is an arbitrary point located on the same side of 
fi as P is. Similarly, repositioning P consists of moving P 
step by step toward the centroid of points (P;*) if the quality 
of the worst element in (Ķ;) is improved. This process 
results in optimal quality elements for the elements in the 
ball of P. 

To find point Př, we can consider the centroid of the 
optimal points related to the f,’s, each of which is evaluated 
in the Euclidean structure related to the metric defined at a 
vertex of K;. 


5 ADAPTIVE FEM COMPUTATIONS 


Let us consider a bounded domain Q described by its 
boundary surface T. In the context of a numerical sim- 
ulation performed on domain & using a mesh of Q as 


spatial support, we suggest a general scheme including an 
adaptive meshing loop of &. This scheme is made up of 
two distinct parts. The first part only involves the gener- 
ation of an initial mcsh of the computational domain Q. 
The second part concerns an adaptation loop including the 
computation, the a posteriori error estimation, and the gen- 
eration of adapted meshes. The method advocated in what 
follows is an h—method, where the parameter of adapta- 
tion is h, for example, the size (or the directional sizes) of 
the mesh elements. Other adaptation methods exist, includ- 
ing p—methods, hp—methods, and hierarchical methods, as 
well as some others (such as local refinement) (see Chap- 
ter 4, this Volume, Chapter 2, Volume 2). 


5.1 Initial mesh of Q 


The problem consists of constructing a mesh of 82 from an 
initial reference mesh 7,,¢({") of its boundary T and from 
a metric map Mo (l), indicating the desired element sizes 
on the surface. To construct this initial mesh 7(Q), we 
proceed in several steps: 


e The initial reference mesh Tepl) is simplified and 
optimized in a given Hausdorff envelope, so as to 
obtain a geometric reference mesh Tiet, l) of r 

e A geometric ‘smooth’ support (piecewise G! contin- 
uous) is defined on mesh Trete C), so as to obtain a 
smooth geometric representation of boundary T 

e Metric map M,(I), supplied on mesh Fef, (l) is 


then rectified so as to be compatible with the surface = 


geometry 
e The rectified map Mọ(T) is again modified to account 
for the desired mesh gradation 
e Mesh Ter (T) is adapted in terms of element sizes to 
the modified map M,(I) so as to obtain the initial 
computational mesh 79(T) : 
e Volume mesh (Q) is generated from mesh Th) 


associated with the metric map M,(T). 


This schematic flow is illustrated in Figure 1, where one 
can see the data flowchart related to the input and the output 
of the various procedures involved in the entire process. 


5.2 General diagram of an adaptive computation 


The adaptation loop aims to capture and to refine the 
physical solution of the numerical simulation performed 
on domain Q. In general, a computation performed on 
mesh (Q) does not allow a satisfactory solution to 
be obtained. Hence, we suggest the following iterative 
adaptation scheme, in which at each iteration step i 


(Tres )) 


Y 
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Figure 1. The various stages of the construction of the initial (volume) mesh of £: from the initial mesh Tiet(T') of the boundary to 


the computational mesh 7() and the associated metric map Mo(&). 


ə acomputation is performed on mesh 7;(S2), leading to 

the solutions field S; (&2); 

e solution & (Q) is analyzed using an adequate error 

estimator and a metric map M,(Q) is deduced, 

prescribing the element sizes in order to obtain a 

subsequent solution at a given accuracy; 

e the metric map M,(I) restricted to surface P is 

rectified with regard to the surface geometry; 

e the (partially) rectified metric map M; (Q) is modified 

to take into account the specified mesh gradation; 

e surface mesh 7;(I") is adapted in terms of element 

sizes to the metric modified map M,(I) to obtain 

mesh 7;,_,(0); 

e volume mesh 7;,,({2) adapted to the metric ficld 

M,(Q) is generated from mesh 7, (I) associated with 

its metric map M,(T); 

e solution S,(Q) associated with mesh 7; (Q) is interpo- 
lated on mesh 7; | (Q). 

These various stages are illustrated in Figure 2. 


5.3 Error estimates 


There exist several error estimate strategies that make it 
possible to control a posteriori the error with the solution 
computed on a finite element mesh (Fortin, 2000). These 
estimators can be used to control the mesh by means of an 
h-adaption method in such a way that the resulting solution 
is of a given accuracy. 


Among these estimators are those that are based on the 
interpolation error (and, therefore, of a purely geometric 
nature as the operator itself is not considered). This class 
of estimators has been studied by various authors (Babuska 
and Aziz, 1976; D’Azevedo and Simpson, 1991; Rippa, 
1992; Berzins, 1999). Nevertheless, most of these papers 
require a parameter, h, the element size, to be small or to 
vanish, and therefore are asymptotic results. The estimator 
then makes use of appropriate Taylor expansions and 
provides information about the desired size, h. However, 
as this size is not necessarily small, we propose a new 
approach that, while close, does not necessitate any peculiar 
assumption about this parameter, and, therefore, is likely to 
be more justified. Our approach is, to some extent, close 
to solutions used in a different topic, the construction of 
meshes in parameterized patches (see Sheng and Hirsch, 
1992; Anglada et al., 1999; Borouchaki et al., 2001; among 
others). 


5.3.1 Problem statement and state of the art 


Let & be a domain in R? (where d = 1, 2, or 3) and let T be 
a simplicial mesh of Q, where the simplices are linear, P i 
or quadratic, P?, elements. We assume that we have in hand 
the solution, denoted by u (to meet the classical notations 
in error estimate literature), of a finite element computation 
previously done in Q using Tas a mesh, this solution being 
scalar values denoted by uz. Let u be the exact solution; the 
problem first involves computing the gap ey = u — ug from 
u to uz, which represents the underlying error due to the 
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Figure 2. The various stages of the general adaptation scheme. 


finite element approximation. Then, it involves constructing 
a new mesh 7’ such that the estimated gap from u to 
the solution 4p, as obtained using a new computation, is 
bounded by a given threshold. Specific issues include 


e how to evaluate gap ey from u and uz; 

e how to use this information to construct a new mesh 
such that the corresponding gap is bounded in the 
desired range. 


The finite element solution u is not interpolant (e.g. the 
solution at the nodes of Tis not coincident with the exact 
value u at these nodes). Moreover, for any element in the 
mesh, it is not possible to guarantee that uy is coincident 
with the exact value of u at at least one point in this element. 
It is therefore tedious to explicitly access the gap of er: 
However, the direct analysis of this gap has been studied 
in a number of works (Verfürth, 1996), But, in general, 
it remains an open problem. Therefore, other nondirect 
approaches have been proposed to quantify or bound this 
gap. Let îy be the interpolate of u over mesh 7 (here is a 
linear or a quadratic piecewise function after the degree of 
the elements in 7) and let è} be the gap u — Ñy from u to 
Tn the so-called interpolation error about u for T. To obtain 
the range of gap ez, we assume the following relation to be 
satisfied: 


llerll < Cllérl| 
where j}.|| stands for a norm and C is a constant inde- 


pendent of 7; In other words, we assume the finite element 
error to be majored by the interpolation error. This allows 
us to simplify the initial problem by considering the follow- 
ing problem: given i;, the interpolation of u over mesh 7, 


M,(Q) 


how can we construct another mesh 7” where the inter- 
polation error is bounded by a given threshold? As y 
can be seen as a discrete representation of u, the prob- 
lem reduces to finding a characterization of the meshes 
where this interpolation error is bounded. This topic has 
been addressed in various papers, such as Berzins (1999) 
and, in most of them, using a ‘measure’ of the interpolation 
error makes it possible to find some constraints to which 
the mesh elements must conform. In the context of mesh- 
adaptation methods, we are mainly interested in h-methods 
or adaptation in size where these constraints are expressed 
in terms of element sizes. Some classical measures of this 
error, classified into two categories, continuous or discrete, 
together with the corresponding constraints about the mesh 
elements are discussed in Berzins (1999). It turns out that 
the discrete approach is more convenient from the point of 
view of mesh adaptation. 

In this chapter, we first propose a majoration of the 
interpolation error in two dimensions that extends to 
arbitrary dimensions (and is close to a work by Anglada 
et al. (1999)). Then, we show how this majoration can 
be used to adapt the mesh. After which, we introduce a 
new measure for quantifying the interpolation error, which 
depends on the local deformation of the Cartesian surface 
associated with the solution. We demonstrate how this 
measure makes it possible to control the interpolation error 
in H! norm and, therefore, is likely to be more appropriate. 


5.3.2 A bound in two dimensions 


We consider a mesh made up of piecewise linear triangles. 
Let K be a mesh element, and let u be the mapping from 
R? to R, assumed to be sufficiently smooth. We note by 


Ti„u the linear interpolant of u over K, and we assume u 
and IT, to Be coincident at the vertices of K. The aim is 
to bound |(u — M u)(x)] for x in K. 


An isotropic bound 
In the reference George (2001) it is shown that 


2 
e = |lu = TO @) loo $ 5L°M (2) 


where L is the longest edge in K and M is given by (H, 
being the Hessian of u) 


M=max{ max awm) 
eK \ Te 


After (2), a gap in the range of ¢ for e implies that L is 
such that 


P< (3) 


while observing that if M is zero, any value of L is 
convenient. Therefore, after the value of Ana» the diameter 
of element K, is compared with L, one can know whether 
triangle K is suitable, too small, or too large. Therefore, a 
size adaptation can be envisaged if necessary. 

However, in practice, the difficulty is to evaluate M using 
Ti,u by means of 


e approaching V,(a), V„(b), and V,(c), 

e approaching H,,(a), H,(b), and H,,(c), 

e approaching M by the largest eigenvalue of matrices 
JH (a)l, |H,(b)|, and |H,(c)|, where |H,| is con- 
structed from H, after being made positive definite. 


Similarly, vectors V, (a), V,(b), and V,,(c) and matrices 
H,(a), H,(b), and H,(c) can be approximated using 
generalized finite differences (a variation of the Green 
formula (Raviart and Thomas, 1988)) of function I,u. 
However, the Taylor expansion about a, b, and c can be 
used. To this end, we assume K = [a, b, c] to be an element 
inside the mesh and we denote by (a;) the set of the mesh 
vertices adjacent to a (thus including b and c). Writing the 
Taylor expansion with respect to a in each a, yields the 
following overdetermined system 


(uta) ~ ula) + (di, V,(a)) 
which is equivalent to the minimization problem 


min X w?((@d,, Z) — u(a,) + ua)? 
ayy 
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where w; is a weight, measuring the influence of equation 
i in the system. The solution of this problem is that of 
the linear system ‘AWA Z = WB, where A is the matrix 
made up of the row vectors (CZAR W being the diagonal 
matrix related to the weights (w?), and ß being the vector 
with components (u(a;) — u(a)). Similarly, vector V,,(a) 
being known, Hessian H,,(a) can be computed by the linear 
system 


futai) ~ ula) + (adi, V,(a)) + Had, H, (a)ad,) 


which is equivalent to a similar minimization problem. 
Vectors V,(b) and V,(c), as well as matrices H,(b) and 
H,,(c), are obtained in the same way. 

Owing to the isotropic nature of this result, using this 
kind of estimator may lead to an unnecessarily fine mesh. 
In fact, M is the largest eigenvalue of the family of 
operators |H,,| (in all points in K) and therefore L is the 
size corresponding to this value. As a consequence, an 
anisotropic phenomenon will be considered in an isotropic 
way while imposing, in all directions, a size equal to the 
smallest size related to the eigenvalues. 


An anisotropic bound 

We assume that vertex a of K is the site of x (i.e. x is closer 
to a than to b and c) to be the point where a maximal gap 
occurs. We also assume x to be in K (and not in one edge 
of K, which leads to a similar result). Then we note a’, the 
point of intersection of the line supporting ax with the edge 
opposite a, for example, edge bc in K. We expand e in a 
from x by means of the Taylor expansion with integral 


e(a) = (u — T1,u)(a) = (u — Mpu) (x) 
+ (Xà, V,,(u — 11,4) (x)) 


+a — t) (aÈ, H, (x + tà) ) dt 
i 


i 
As a is the site of x, the scalar value à such that Z? = Maa 
is smaller than 2/3; therefore 


1 
le(x)| = |f a -AX (aa, H,(a +1%8)aa’) arl 
0 


1 => -=> 
< ; If (1 — t) (aa', H, (a + 1%a)aa') dt 
Jo 
which yields 


le(x)| < Z max \(aa’, H,,(y)aa")| O 
hi 
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Remark. In an arbitrary dimension, for example, d, the 
constant 2/9 should be (1/2 (d/d + 1)?). 


After (4), imposing a gap of £ for e leads for triangle 
K = [a,b,c] 


—, — 9 
max |(aa', H,()aa’)| < Se 
yeK 2 
where a’ is defined as above. This inequality is satisfied if 


es, tn lee 
max(aa’, |H, (y)\aa’) < 58 
yEK 2 
Let M(a) be the symmetric positive-definite matrix such 
that 


max(až, |H,(y)|@2) < (a2, M(a)az) 
yE. 


for all z in K and such that the region (e.g. bounded by the 
corresponding ellipse) defined by 


(az, M(ayaz) < 2e 


has a minimal surface. Then M(a) results in a size 
constraint or again a metric in a that varies after the 
directions. 

In these equations, we have assumed the site of point x 
where gap e is maximal to be a. As x is not known, the 
sites b and c must be taken into account in turn. This leads 
to the system 


=> => 9 
max (aa’, |H,(y)laa’) < 58 
yek 


=> BET 9 
max (bb', |H,,(y)|bb’) < 5e 
yeK 2 


ay > 9 
max (cc’, |H lec} < 5° 
yeRK 


where b’ and c’ are points respectively in edges ac and 
ab. As above, we can define the metrics M(a), M(), and 
M(c) in a, b, and respectively c such that 


> — — = 9 
max (aa’, |H,(y)laa') < (aa', M(a)aa’) < x8 
yek 2 

—> —> ar] => 9 
max (bb', |H,,(y)|bb’) < (bb', M(b)bb’) < Ze 
yexk 2 
= => > = 9 
max (cc’, |H, jec’) < {ec’, M(c)ce’) < z 
yek 


Therefore, for triangle K, a metric M(K) can be con- 
structed in such a way as, on the one hand, equations 


(BT, MKT) < (7, Mav) 
(T, MK?) < (T, MOT) 
(T, MK) T) < (T, MOT) 


are satisfied for any vector WV, and, on the other hand, the 
surface (V, M(K)@) =1 is maximal. In other words, 
the metric in K is the largest size constraint along all the 
directions satisfying the size constraints at the vertices a, 
b, and c. 

Actually, for the sake of simplicity, we consider M (a) = 
|H,(a)|, M(b) = |H,(b)|, and M(c) = |H, (c)|, matrix H, 
being determined as in the isotropic case. Metric M(K) can 
be defined as the intersection of the three metrics M(x) for 
x = a, b, and c (cf. Frey and George, 2000). 


5.3.3 A surface-based approach 


In the previous sections, we proposed a majoration about 
the interpolation error on the basis of the Hessian of the 
solution that has been directly used to obtain the size 
constraints about the mesh elements. In the present section, 
we propose a new approach by considering an appropriate 
Cartesian surface. 

Let Q be the computational domain, let 7(&2) be a mesh 
of Q, and let u(Q) be the physical solution obtained in & 
via mesh KQ). The pair (7(Q), u(Q)) allows us to define 
a Cartesian surface Z,(T) (we assume u to be a scalar 
function). Given £, (7), the problem of minimizing the 
interpolation error consists in defining an (optimal) mesh 
Top (82) in Q such that surface Xu (Top) is as smooth as 
possible. To this end, we propose a local characterization 
of the surface near a vertex. Two methods are introduced; 
the first using the local deformation allows for an isotropic 
adaptation, while the other, using the local curvature, results 
in an anisotropic adaptation. j 


Local deformation of a surface 

The basic idea consists in a local characterization of the 
deviation (at order 0) of surface mesh £,(7} near a vertex 
with respect to a reference plane, in specific, the tangent 
plane of the surface at this vertex. This deviation can be 
evaluated by considering the Hessian along the normal to 
the surface (e.g. the second fundamental form). 

Let P be a vertex in the solution surface ©, (Z). Locally, 
near P, this surface has a parameterized form o(x, y), (x, y) 
being the parameters, with P =o(0,0). Using a Taylor 
expansion at order 2 of o near P, results in 


(x, y) = 9(0,0) + ox +05y 
+ 4(of,x? + 20%, xy + ofy yY +o + y%e 


where e = (1,1,1). If v(P) stands for the normal to 
the surface at P, then quantity (v(P), (o(x, y) — o(0,0))) 
represents the gap from point o(x, y) to the tangent plane 
in P and can be written as 


(WP), o%,)x? + 2(v(P), of, )xy 
+ (v(P), o,)y7) + o(x? + y?) 


which is therefore proportional to the second fundamental 
form of the surface when x? + y? is small enough. 

The local deformation of the surface at P is defined as 
the maximal gap of the vertices adjacent to P to the tangent 
plane of the surface at P. If (P;) denotes those vertices, then 
the local deformation e(P) of the surface at P is given by 


e(P) = max(v(P), PB) 


Therefore, the optimal mesh of D,,(7) for Q is a mesh 
where the size at all nodes p is inversely proportional to 
e(P) where P = (p, u(p)). Formally speaking, the optimal 
size h,,(p) associated with node p is written as 


€ 
where £ is the given threshold, and A(p) is the size of the 
elements near p in mesh F(Q). 

As can be seen, the local deformation is a rather easy way 
to characterize the local deviation of the surface, which 
does not involve computing the explicit computation of 
the Hessian of the solution. The only drawback in this 
measure is that it allows only an isotropic adaptation. In 
the same context (minimizing the local deviation), using 
the curvature allows an analysis of this deviation, which is 
both more precise and anisotropic. 


Local curvature of a surface 

Analyzing the local curvature of the surface related to the 
solution also makes it possible to minimize the deviation 
(order 1) from the tangent planes of the solution that 
interpolate the exact solution. Indeed, while considering the 
construction of isotropic surface meshes, we have shown 
in Borouchaki et al. (2001) how these two deviations, of 
order 0 and 1, are bounded by a given threshold and 
how the element size at all vertices is proportional to the 
minimal radius of curvature. Let P = (p, u(p)) be a vertex 
in 2, (J), let p, (P) and p,(P) with p,(P) < 9,(P) be the 
two principal radii of curvature, and let (È (P), @(P)) be 
the corresponding unit principal directions. The ideal size 
at P is 


hl P) = yo, (P) 
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where y is a factor related to the specified threshold about 
the deviation. This size is defined in the tangent plane at the 
surface at P. Let us consider the frame (P, @; (P), @ (P)) 
in this plane, if h2,(P) reads h2,(P) = hP er (P) + 
hE @3(P) in this frame then the constraint in size at P 
can be written in (P, Z? (P), @ (P)) by 


s pr)2_(4)\_ 
(h? h) A (aje 


which is the equation of a circle centered at P in the 
tangent plane at the surface at P. By means of an orthogonal 
projection of this circle in the plane of Q, we obtain the 
size constraint at p. If D7 (p) and 7 ( p) are the orthogonal 
projections of Z? (P) and Z (P) in the plane of 2, then 
re Eales = FP > 
this size constraint in frame (p, i, j) (i =d,0) et 
7 = 0, 1) is given by 


: ay h 
(h a(R za) FAD 


(Fe) Bo) (h)=1 


where (h,, h2) are the coordinates in the frame (p, T 7) 
of the projection of Až, in the plane of Q. This relationship 
defines a metric (which is in general anisotropic) at p. 
The metric previously defined may lead to a large number 
of elements due to the isotropic nature of the elements in 
the surface. In order to minimize this number of elements, 
and for anisotropic meshing purposes, a similar relationship 
involving the two principal radii of curvature can be 
exhibited (Frey and George, 2000). In such a case, the 
ideal size of the surface element is given using a so-called 
geometric metric, which, at vertex P of X,(7), is written as 


1 


aoe Teor 0 
aed ta 


n(Y, Pi (P), P2(P))93(P) 


where y is a factor related to the given threshold about the 
deviation and 1 (y, p; (P), ?2(P)) is a function related to y, 
,(P), and p,(P), which ensures a similar deviation along 
the two principal directions. This relationship generally 
describes an ellipse in the tangent plane of the surface at 
P, which includes the circle obtained in the isotropic case. 
Similarly, the corresponding metric at p is obtained after a 
projection of this ellipse in the plane of Q. 
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In practice, computing the local curvature at all vertices 
in this surface first leads to computing the normal (then 
the gradient) by means of a weighted average of the unit 
normal at the adjacent elements. Then, in the local frame 
(the tangent plane together with the normal), we construct 
a quadratic centered at this vertex, which passes at best 
through the adjacent vertices, after which we consider 
locally the Hessian to be that of this quadric. Finally, 
using the gradient and the Hessian at the nodes of KQ), 
we compute the principal curvatures and directions at all 
vertices in the surface (7). 


5.4 Solution interpolation 


In the proposed adaptive schema, it is necessary to interpo- 
late the current solution from the former mesh to the new 
one in order to continue the computation at a given stage. 

This step involves interpolating solution S; (82) associated 
with mesh 7;(S2) on 7;,,,(Q). In the case where there are 
no physical constraints in the PDE problem under solution, 
the interpolation problem is written as a simple optimization 
probiem like 


min |S; ,,(&) — S; (| 


where |j.|{ is a norm, for instance, a L, or a H, Sobolev 
norm, and each solution S is associated with its corre- 
sponding mesh. The solution of this minimization problem 
necessitates computing the intersection of mesh 7;(Q) and 
mesh 7; (82). In cases where the physical constraints must 
be considered, the underlying problem is a constrained 
minimization problem. This is well understood for linear 
physical constrained operators (Bank, 1997). 

However, in practice, a simple linear interpolation of 
§(Q) on F;,;(Q) allows a solution that is close to the 
ideal targeted solution. This linear interpolation does not 
require complex computations such as explicitly computing 
the mesh intersections. 


6 LARGE-SIZE PROBLEM, 
PARALLELISM AND ADAPTIVITY 


Large-size problems are mainly encountered in some com- 
plex CFD calculations even where adaptive methods are 
used. Parallel technologies are therefore the only solution to 
handle such problems. Parallelism may be included in var- 
ious computational steps. Mesh construction together with 
solution methods may benefit to some extent from paral- 
lelism. The first aspect involves in constructing the mesh 
in parallel, while the second point concerns parallelizing 
solvers (see Chapter 20, Chapter 22 of this Volume). 


6.1 Parallel meshing methods 


There are essentially two different ways to construct a 
mesh in parallel following an a posteriori or an a pri- 
ori approach. In the first approach, a given mesh of the 
domain is subdivided into a number of meshed regions 
such that the number of elements is well balanced from 
one processor to the other while minimizing the number 
of nodes at the processor (region) interfaces. Other crite- 
ria related to the problem in hand can be added to the 
above classical requirements. Various strategies have been 
proposed to achieve these goals. After this domain decom- 
position, first the interfaces between regions are meshed 
in parallel, then regions are considered for being meshed 
in parallel (Shostko and Löhner, 1995; Aliabadi and Tez- 
duyar, 1995; Weatherill et al., 1998; Léhner, 2001). Such 
an approach involves a serial-meshing method (while par- 
allel meshing methods also exist). The second approach 
no longer considers a domain mesh but, in its place, 
defines the subdivision using only the boundary dis- 
cretizations. In this way, defining an interface requires 
finding a surface that passes through a contour part of 
the domain boundary decomposition (Galtier, 1997). Then 
this surface is meshed, thus completing the definition 
of the regions, after which each region is meshed in 
parallel, 

The main difficulty of the first approach is the need 
for an initial mesh of the full domain, while, in the sec- 
ond approach, the key-point is the proper definition of 
surface interfaces. Conversely, the first approach allows 
for a simple definition of the interfaces (as part of the 
volume mesh faces) and the second approach avoids 
meshing the full domain (with a coarse mesh in gen- 
eral). 


6.2 Parallel solvers 


While various paradigms are used in the solvers, the main 
issue is the proper management of the interfaces from 
region to region. This requires communication between 
meshes. In this respect, minimizing the node interface 
as well as insuring some degree of smoothness at the 
interface level is of great importance. Also of inter- 
est is load balancing to avoid having idle processors 
G.e. waiting for others). Balancing the load necessitates, 
a priori, evaluating the number of elements in each 
region, which is quite easy when using the first approach, 
whereas it has proved to be a tedious task in the second 
approach. 
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Figure 3. Uniform and geometric meshes of a part of a 747 boeing model. 


7 MESHING FOR MOVING BOUNDARY 
PROBLEMS 


Moving boundary problems mainly occur in solid mechan- 
ical engineering specifically in stamping and forming pro- 
cesses. To some extent, simulating the trajectory of a body 
in some CFD problems or modeling the behavior of a 
semiconductor device in etching processes lead to closely 
related problems. In solid mechanical problems, the geome- 
try of the mechanical part is not known during the process, 
and it is subjected to large deformations. In such a case, 
the remeshing step is a crucial part of the simulation. For 
CED concerns, the geometry is known in advance, but rigid 
movements apply, and, in this sense, such a problem can 
be seen as a classical remeshing problem. Meshing for 
moving boundary problems where large deformations are 
applied to the geometry of the domain can be addressed 
using two classes of methods. The first involves mesh opti- 
mization that considers a topologically correct (in terms 
of connectivity) mesh that is geometrically invalid (neg- 
ative volume or overlapping elements). The other class 
makes use of a full two-step remeshing procedure; remesh- 
ing the domain boundary prior to remeshing the domain 
itself. The following briefly discusses the latter method, 
while the reader is referred to Coupez (1991) for the first 
approach. 


7.1 Moving problems in two dimensions 


Let Q be the mechanical part defined from its boundary T 
assumed to be made up of piecewise linear line segments 
TT). This boundary discretization can be obtained from a 
CAD definition of T. However, this definition is no longer 
useful when defining the geometry at a further step of the 
deformation process (the final deformation is assumed to 
be the summation of small step changes). The remeshing 
can be applied after each increment of deformation as 
follows: 


(a) (b) 

Figure 4. Uniform mesh of a car seat (a) and its flatten- 
ing map (b); data courtesy of LECTRA. A color version of 
this image is available at http://;www.mrw.interscience.wiley. 
com/ecm. 


Figure 5. Regular mesh of a DNA molecule. A color version 
of this image is available at http://www.mrw.interscience.wiley. 
com/ecm. 


e Definition of the new geometry G(T) after deformation 

e Geometric error estimation (deviation of the current 
discretization ZT) from the new geometry G(T)) 
resulting in a size map H,(I) used to govern the 
rediscretization of T 
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Figure 7. Stamping of a sheet of metal, from the planar sheet to the stamped result. 
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eee O 


e Physical error estimation (deviation of the current 
solution S(Q) from an ideal solution assumed to be 
smooth enough) that results in a size map Hg({2) 
serving to govern the remeshing of Q 

e Definition of the full size map H({Q) by merging 
Hg) and Hy (Q) 

o Adaptive rediscretization of I with respect to H(Q) 

e Adaptive remeshing of Q with respect to H(Q). 


The deformations of the geometry include free together with 
bounded deformations. The first type is a free deformation 
duc to a mechanical constraint (for instance, equilibrium 
conditions). In this case, the new geometry of the part after 
deformation is only defined by the new positions of the 
boundary nodes together with their connections. The second 
type is a deformation limited by a contact with a second 
domain whose geometry is fixed (the tool is assumed to be 
rigid). In this case, the part takes the geometric shape of 
the tool, and thus its geometry after deformation is that of 
the tool. 

Geometric error estimation is based on the evaluation of 
the curvature of the boundary at each node (for a free node, 
this curvature is that of the current boundary, while for a 
bounded node, the curvature is that of the related part of 
the tool in contact). Physical error estimation is based on 
the interpolation error and can be accessed by computing 
the discrete Hessian of the current solution. 

Merging two size maps involves constructing a unique 
map where the size is the minimum of the sizes in the two 
given maps. 

With this material, the remeshing procedures follow the 
same aspect as in a classical adaptive remeshing scheme 
(see above). 
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7.2 Moving problems in three dimensions 


In this case, the boundary is made up of linear triangular 
elements and the remeshing scheme proposed in the previ- 
ous section applies. However, some points are much more 
tedious, including 


identifying the bounded nodes, 
computing the discrete Hessian part of the physical 
error estimate, 

e the full three-dimensional nature of the adaptive re- 
meshing process. 


For the sake of simplicity, the bounded nodes can be dealt 
with like the free nodes. 


8 APPLICATION EXAMPLES 


A number of application examples are given to demon- 
strate the approaches we have proposed. Figure 3 displays 
a uniform mesh and a geometric mesh constructed on a 747 
boeing model defined by a uniform fine grid made up of 
quads. After using this grid to define a geometry (by means 
of Coons patches), we meet a parametric surface-meshing 
problem. Figure 4 demonstrates the construction of a reg- 
ular mesh for a series of CAD patches. Figure 5 shows 
the mesh of a DNA molecule (by means of a Connolly 
surface) where the geometry is defined by the constituent 
atoms (e.g. a series of intersecting spheres). Figures 6 and 7 
give examples about forming and stamping problems (e.g. 
moving boundary problems). Figure 8 illustrates two stages 
of a mesh for a transonic calculation in two dimensions, 


Figure 8. Transonic flow around a Naca0012 wing, (initial and adapted) meshes and isodensities. A color version of this image is 


available at http://www.mrw.interscience.wiley.com/ecm 
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Figure 9. Transonic fiow around a wing in three dimensions, cut through the tet mesh and isodensities (initial and adapted meshes). A 
color version of this image is available at http://www.mrw.interscience.wiley.com/ecm 


while Figure 9 considers an example in three dimensions. 
Figure 10 shows an example in biomedical engineering. 
Then, Figure 11 gives an example of mesh simplification, 
which is a useful method for various engineering problems 
as well as for image compression or compact data storage. 


9 CONCLUSIONS 


In this chapter, we have discussed mesh-generation methods 
and mesh-adaptivity issues for automated planar, surface, 
and volume meshing. After a review of the classical mesh- 
generation methods, we have considered adaptive schemes 
where the solution is to be accurately captured. To this 
end, meshing techniques have been revisited to be capable 
of completing high-quality meshes conforming to these 
features. Error estimates have therefore been introduced to 


analyze the solution field at a given stage prior to being used 
to complete adapted meshes. Some details about large-size 
meshes and moving boundary problems have been given. 
Application examples have been shown to demonstrate the 
various approaches proposed throughout the chapter. 

While mature in a number of engineering fields, cur- 
rent meshing technologies still need to be investigated to 
handle nonsimplicial elements (quads, hexes (still a chal- 
lenge to date), ...) as well as nonlinear elements (quadratic 
or higher degrees). Surprisingly, surface meshing has not 
been particularly well addressed so far. Robust imple- 
mentation for anisotropic meshing in three dimensions is 
still a field of intensive work. Meshing problems that 
include colliding regions are certainly a pertinent sub- 
ject of future investigations. Meshes with billions of ele- 
ments also lead to interesting topics on massively parallel 
strategies. 
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Figure 10. Initial dense meshes of a brain, a cranian bone and a scalp and corresponding simplified meshes (Hausdorff distance). A 
color version of this image is available at http://www.mrw interscience.wiley.com/ecm 


Figure 11. Simplified meshes of Lucy statue and corresponding enlargements. A color version of this image is available at 


http://www.mrw.interscience.wiley.com/ecm 
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1 INTRODUCTION 


This section defines visualization and introduces a taxon- 
omy of different types of visualization. Uses of visualiza- 
tion in the computational sciences are also described. 


1.1 What is visualization? 


In its broadest definition, visualization is the transformation 
of data or information into sensory input perceivable by the 
human observer (Schroeder, Martin and Lorensen, 2003). 
The purpose of visualization is to engage the human per- 
ceptual system in such a way as to transmit pertinent infor- 
mation to the analyst as efficiently as possible. Unlike auto- 
mated processes such as artificial intelligence that attempt 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
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to produce results independent of the human observer, the 
visualization process presumes the observer’s existence and 
engages him/her directly in the exploration of the problem. 
For this reason, the most effective visualization techniques 
are interactive, that is, they respond to commands from the 
observer in real time with minimal latency and data rates 
of approximately 5 Hz or greater. Batch or off-line produc- 
tion of visualization results is also important, but these are 
typically preprogrammed once the analyst knows where to 
look and what to look for. 

By definition, visualization techniques encompass sen- 
sory input of vision, sound, touch (haptics), taste, and smell. 
Visual representations remain far and away the most widely 
used techniques, while sound- and haptic-based methods 
have been used in some applications with limited success. 
Visualization techniques based on taste and olfactory input 
remain experimental. 


1.2 Terminology 


Different terminologies are used to describe visualization. 
Scientific visualization is the formal name given to the field 
in computer science that encompasses user interface, data 
representation and processing algorithms, visual representa- 
tions, and other sensory presentation such as sound or touch 
(McCormick, DeFanti and Brown, 1987). Scientific visual- 
ization is generally used in the context of spatial-temporal 
domains (such as those found in computational mechan- 
ics). The term data visualization is another phrase used to 
describe visualization. Data visualization is generally inter- 
preted to be more general than scientific visualization, since 
it implies treatment of data sources beyond the sciences 
and engineering. Such data sources include financial, 
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marketing, or business data. In addition, the term ‘data 
visualization’ is broad enough to include application of sta- 
tistical methods and other standard data analysis techniques 
(Rosenblum ef al., 1994). Another recently emerging term 
is information visualization. This field endeavors to visu- 
alize abstract information such as hypertext documents on 
the World Wide Web, directory/file structures on a com- 
puter, or abstract data structures (The First Information 
Visualization Symposium, 1995). A major challenge facing 
information visualization researchers is to develop coor- 
dinate systems, transformation methods, or structures that 
meaningfully organize and represent data. 

Another way to classify visualization technology is to 
examine the context in which the data exists. If the data 
is spatial-temporal in nature (up to three spatial coor- 
dinates and the time dimension), then typically methods 
from scientific visualization are used. If the data exists in 
higher-dimensional spaces, or abstract spaces, then meth- 
ods from information visualization are used. This distinc- 
tion is important, because the human perceptual system is 
highly tuned to space—time relationships. Data expressed in 
this coordinate system is inherently understood with little 
need for explanation. Visualization of abstract data typi- 
cally requires extensive explanations as to what is being 
viewed and what the display paradigm is. This is not to 
say that there is no overlap between scientific and informa- 
tion visualization — often the first step in the information 
visualization process is to project abstract data into the 
spatial-temporal domain, and then use the methods of 
scientific visualization to view the results. The projection 
process can be quite complex, involving methods of statis- 
tical graphics, data mining, and other techniques, or it may 
be as simple as selecting a lower-dimensional subset of the 
original data. 

The term ‘visualization’ may be used to mean different 
operations depending on the particular context. Geometry 
visualization is the viewing of geometric models, meshes 
or grids, or other information representing the topology 
and geometry of the computational domain. Geometry 
visualization often includes modeling operations such as 
cutting, clipping, or extracting portions of the domain, 
and may include supplemental operations such as collision 
detection and measurement. Results visualization typically 
combines the viewing of geometry in conjunction with 
attribute data, for example, coloring the surface of an air- 
craft wing with pressure values. Attribute data is typically 
classified as scalar (single-valued), vector (n-component 
vector), or tensor (general matrix of values). Techniques 
in scientific visualization also include a modeling com- 
ponent, for example, producing stream surfaces from a 
three-dimensional vector field. In this example, the vector 


data is used to control the generation of a surface follow- 
ing the rules of fluid flow (a surface tangent to the flow at 
all points). 


1.3 The role of visualization in computational 
methods 


Visualization techniques can be used in all phases of the 
computational process: preprocessing (defining input geom- 
etry, boundary conditions, and loads); solution (numer- 
ical solution); and postprocessing (viewing, interacting, 
and analyzing results). The following sections provide an 
overview of the tasks involved in each phase. 


1.3.1 Preprocessing 


Modern computational systems rely heavily on visualization 
and graphics techniques to assist the analyst to define the 
input geometry, specify boundary conditions, apply loads to 
the model and/or generate the computational grid. Geome- 
try definition requires modeling the domain, including using 
abstract representations (e.g. 2-manifolds embedded in 3- 
space to represent shell-like structures). This is typically 
an interactive activity requiring graphical input to define, 
shape, and edit the geometric representation. Loading and 
boundary conditions are often applied using additional geo- 
metric primitives to indicate the region, direction, and 
magnitude of application. Mesh generation is often an auto- 
matic process once the geometry, loading, and boundary 
conditions have been defined. However, interactive input 
may be required to indicate regions of higher mesh den- 
sity, or to perform validation of the mesh subsequent to 
generation. In some cases, manual mesh generation is still 
performed. This may require interactive methods to decom- 
pose the domain into topologically regular blocks, specify 
gradation density in each block, and control mesh layout. 


1.3.2 Postprocessing 


Visualization is best known for its use in postprocessing 
the results of analysis. Geometry visualization is used to 
inspect the computational grid or underlying geometric 
representation if the problem produces evolving geometry 
(e.g. shape optimization). Results visualization is used to 
view scalar, vector, and tensor fields, often in the context 
of the geometry (spatial domain) of the problem. Other 
important techniques include probing data to produce 1- 
D or 2-D plots; clipping, cutting, and extracting data to 
focus on particular regions of the domain; animàting results 
over time or design iterations to view the evolution of the 
solution; and comparing results across different analyses. 
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Figure 1. Visualization of computational analysis. Here, flow 
over a rotating blade is shown. (Courtesy SCOREC/Rensselaer.) 
A color version of this image is available at http://www.mrw. 
interscience.wiley.com/ecm 


Many of these techniques will be described in more detail 
later in this chapter. 


1.3.3 Analytical steering 


Visualization techniques are employed during the solution 
phase to monitor solution convergence or view intermediate 
results. Some systems provide methods to analytically steer 
the solution process. For example, while monitoring the 
numerical solution process, it is possible to adjust time 
steps or the computational grid to accelerate convergence 
or improve accuracy. In practice, analytical steering may 
be implemented by producing intermediate solution files, 
or directly linking the postprocessing module to the solver. 
When generating intermediate files, standard postprocessing 
tools and techniques are used to visualize the results at 
which point the analyst may adjust input parameters or 
restart the solution with modified input. Relatively few 
Systems provide run-time linking directly to the solution 
solver. While this is arguably the better approach, it requires 
rewriting software to integrate the visualization system into 
the numerical solver. 


1.4 3-D computer graphics 


Underlying most visualization techniques are methods 
based on computer graphics. Because of recent efforts to 
accelerate 3-D rendering on relatively inexpensive graphics 
hardware — motivated principally by gaming, video, and 


other forms of entertainment — graphics display methods 
tend to follow particular patterns based on the capabilities 
of the hardware (i.e. accelerated polygon rendering with 
textures). This in turn drives the implementation of 
visualization algorithms. For example, most algorithms 
currently produce linear primitives such as points, lines, 
and polygons. Representing geometry with these graphics 
primitives will generally produce responsive, interactive 
visualizations. In addition, modern graphics hardware 
supports features such as texture mapping. Texture mapping 
can be used to great benefit for coloring surfaces based on 
data value, ‘cutting’ data using textures with transparency, 
and performing efficient volume rendering with 3-D 
textures. As a result, many areas of visualization research 
languish (e.g. methods based on higher-order, nonlinear 
ptimitives) in preference to techniques that produce higher 
rates of interaction such as polygon and texture-based 
rendering techniques, 

In addition, because methods in computer graphics are 
generally used, the analyst must be aware of the inherent 
limitations and terminology used in the rendering pro- 
cess, lighting models, and camera models. In the next few 
sections, we present some basic information. For more 
information, please see books on computer graphics such 
as Foley et al. (1990) and Watt (1993). 


1.4.1 Rendering 


In 3-D computer graphics, a single image is produced 
by rendering a scene (Figure 1). The scene consists of 
geometry with associated attribute data (color or color 
index, surface normals, texture coordinates, etc.); a camera 
to project the geometry onto the view plane (two cameras 
are used for stereo rendering); and lights to illuminate the 
geometry. In addition, transformation matrices (normally 
4x4 homogeneous matrices) are used to position the 
geometry, lights, and camera. Various properties are used 
to control the effect of lighting and the appearance of the 
geometry (described in the following section). 

While a wide variety of rendering techniques are known 
in computer graphics, in visualization two basic approaches 
are used: surface-based and volume rendering. Surface ren- 
dering projects linear geometric primitives such as triangles 
or polygons onto the view plane. Primitives are rasterized 
into pixel values of depth and color using a scan conver- 
sion process (linear interpolation from the polygon edges 
into the interior). Surface properties controlling the color 
of the surface and the lighting model affect the appearance 
of the geometry. Texture mapping is often used to project 
an image onto the surface geometry, with the placement 
of the texture map controlled by texture coordinates. Sur- 
face transparency can be specified to control the visibility 
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of the interior of objects and its relationship to other sur- 
faces. Volume rendering — covered in more detail later in 
this chapter — takes into account the interior of objects to 
see into or through them, producing X ray—like images 
showing interior structure and data variation. 

A major concern of the rendering process is to properly 
sort primitives in the direction of the view vector. The result 
is an image with correct depth occlusion and/or properly 
blended color if the object is translucent. Typically, this is 
performed in hardware using the z-buffer and color buffer 
to retain the depth and color of the pixel closest (in depth, or 
z-direction) to the view position. However, proper blending 
of translucent objects, or use of volume rendering, requires 
an ordering of the primitives. 


1.4.2 Lighting model 


The lighting model generally considers three types of light- 
ing effects: ambient, diffuse, and specular lighting. Ambient 
lighting is an overall illumination independent of surface 
orientation and relationship to a particular light. Diffuse 
lighting, or Lambertian reffection, takes into account the 
orientation of the surface relative to the incident light vec- 
tor. Specular reflection represents the direct reflection of 
light from an object’s surface toward the observer. Spec- 
ular reflection takes into account both the incident light 
vector, the view vector and the surface normal, whereas 
diffuse lighting takes into account only the relation of the 
light vector to the surface normal. 

Surface normals play an important role in the lighting 
model. Flat shading, where the normal across a primitive 
(i.e. polygon) is taken as a constant, results in a faceted 
appearance. Gouraud shading computes color at each vertex 
and then uses linear interpolation during scan conversion to 
color the interior of the polygon. Phong shading interpolates 
the normal from the vertices of the polygon to produce 
a smoother light variation. Figure 2 shows the effect of 
flat shading (constant normal per polygonal cell) versus 


Gouraud shading, which interpolates the normal across 
each polygon. 


1.4.3 Camera model 


While a complete treatment of the projection of geometry 
from 3-D into 2-D is beyond the scope of this chapter, there 
are a few key concepts of which the visualization prac- 
titioner must be aware. Projection methods are typically 
of two types: perspective and orthographic projection. Per- 
spective projection takes into account the view angle of the 
observer to produce perspective effects (e.g. closer objects 
appear bigger, parallel lines converge at infinity). Ortho- 
graphic projection does not include perspective effects, so 


Figure 2. Gouraud (b) versus flat shading (a). (Courtesy of Kit- 
ware, Inc. Taken from the book The Visualization Toolkit An 
Object-Oriented Approach to 3-D Graphics Third Edition ISBN- 
1-930934-07-6.) A color version of this image is available at 
http://www.mrw.interscience.wiley.com/ecm 


a rendered object remains the same size no matter how 
close or far from the camera it is (a view scale is used to 
zoom in and out). 

In computer graphics, the camera model defines a local 
coordinate system consisting of the camera position, its 
focal point (defining the direction of projection), the view 
plane normal (which may be different than the direction of 
projection resulting in projection shearing effects), the view 
up vector, and near and far clipping planes. The clipping 
planes are oriented perpendicular to the view direction. All 
objects before the near plane, and beyond the far plane are 
culled prior to rendering, which can be used to great benefit 
to focus on a narrow slice of data. 

Stereo viewing is achieved using two cameras — one for 
the left and one for the right eye. The angle of separation 
between the eyes controls the binocular effect. However, 
too large an angle results in eyestrain and difficulty in fusing 
the two images. 


2 DATA FORMS 


A variety of data forms are used in the computational 
sciences. Visualizing these data requires matching the com- 
putational data to the data forms found in visualization. 


2.1 Overview 


There are two distinct data forms that compose a visual- 
ization dataset. The first is the spatial representation, or 
geometric and topological structure of the data. In the com- 
putational science, the structure is typically represented by 
the computational grid or mesh. The second is the data 
attributes — sometimes referred to as data fields — associ- 
ated with the structure. The dataset structure is composed 
of cells and points (sometimes referred to as the elements 
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and nodes). Celis define a topological type and define the 
relationship of the cells points. Points define a geometric 
coordinate and fix the cells in space. Most visualization 
algorithms assume that the field data is associated with 
either the points and/or cells of the dataset. Very few visu- 
alization techniques treat data associated with intermediate 
topological levels such as the edges or faces of a 3-D cell 
(e.g. tetrahedron). Typically, the data structures used to rep- 
resent the topology and geometry of a visualization dataset 
are more compact and of limited functionality as compared 
to the computational system. This is primarily due to the 
emphasis that visualization places on interactive display and 
successful management of large data. 


2.2 Spatial representations 


The dataset structure is classified according to whether the 
structure is regular or irregular, and similarly, whether 
it is implicitly or explicitly represented. These classifi- 
cations apply to both the geometry and topology of the 
dataset, A dataset with regular geometry has point coor- 
dinates x; that are specified by a function P= fE) A 
dataset with regular topology is one where the number and 
type of cells in the dataset, and the neighbors to each cell 
are known implicitly from a specification of the dataset. 
Practically speaking, the difference between regular and 
irregular datasets is that regular topology and geometry 
can be implicitly defined whereas irregular data requires an 
explicit representation of geometry and/or topology. These 
considerations are particularly important in implementation 
because implicit representations require far fewer compu- 
tational resources (both in memory and CPU) than explicit 
representations. On the other hand, irregular data tends to 
have greater flexibility in its ability to adaptively represent 
the domain, 'and in some cases, may require fewer computa- 
tional resources than regular data of equivalent resolution. 


2.2.1 Regular data 


The most common type of regular data (regular both in 
topology and geometry) is images (2-D) and volumes (3- 
D) — regular lattices of points arranged parallel to the 
coordinate axes. Image processing and volume rendering 
are just two of many disciplines devoted to the processing 
and display of this type of data. The principal advantage 
to this form is its simple representation (origin, inter- 
pixel/voxel spacing, and lattice dimensions). This manifests 
itself as compact algorithms that are easily parallelized as 
compared to irregular data forms. 

Structured grids are another form of regular data. Such 
grids are regular in topology, but irregular in geometry. 


That is, the dataset is described by sample dimensions in 
each coordinate axis (e.g. 10 x 20 x 30 grid in 3-D) along 
with a vector of point coordinates explicitly specifying the 
position of each point in the grid. Such grids are often 
collected together to form multiblock datasets, where each 
block of data corresponds to a different grid. 

Rectilinear grids are regular in topology, and like image/ 
volume data, axis-aligned. However, the point coordinates 
are semiregular. A single vector of coordinate values per 
coordinate axis is required. 


2.2.2 Irregular data 


Unstructured grids represent the most general form of irreg- 
ular data, Both the points and cells must be represented 
explicitly. Tetrahedral meshes, or meshes consisting of 
mixed element types are examples of unstructured grids. 
Points are typically represented via a vector of coordinate 
values; cells are represented by a type specification plus a 
connectivity array. Because visualization systems generally 
represent data as compactly as possible, the intermediate 
topological and geometric hierarchy is often not repre- 
sented explicitly; rather it is derived from the point/cell 
relationships. In some cases, the intermediate hierarchy is 
represented on an as-needed basis. 

A subset of the general unstructured grid is often referred 
to as graphics data, or polygonal datasets. These datasets 
are composed of linear graphics primitives such as vertices, 
lines, polygons, and triangle strips. They are an important 
form because graphics systems are typically optimized for 
display of such primitives. Also, many viewing operations 
of even 3-D data require only the surface of the dataset 
since interior cells are hidden. Hence, polygonal data often 
serves as an intermediate form between the analysts’ data 
and the graphical subsystem. 

Another common form of irregular data is unorganized 
point sets. For example, laser digitizers can scan millions 
of points from the surface of an object in a few seconds to 
produce such data. It is also common to subsample a general 
n-dimensional dataset into a set of three-dimensional points, 
and then use typical visualization techniques to explore 
the subset. 


2.2.3 Other representations 


Computational scientists frequently employ adaptive 
meshes such as quaditrees, octrees, and AMR grids 
to represent the problem domain. Often, these are 
transformed in the visualization system to similar 
representations tuned for minimal memory requirements or 
interactive processing. For example, the branch-on-need- 
octree (BONO) (Wilhelms and Van Gelder, 1992) and 
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interval tree (Livnat, Shen and Johnson, 1996) represent 
single-valued scalar data to accelerate the generation of 
isocontours, Other forms, such as AMR grids (Berger and 
Oliger, 1984), may be triangulated (on the fly or as a 
preprocessing step) to create unstructured grids. 

Higher-order meshes such as p-basis finite elements are 
underrepresented in visualization systems. Many systems 
support quadratic — and some even cubic — isoparamet- 
ric basis functions on simplices and rectangular elements. 
However, most visualization systems require pretriangula- 
tion of higher-order basis, or may automatically triangu- 
late higher-order cells into linear primitives that can be 
processed by conventional visualization algorithms. It is 
important that the analysts understand whether such data 
conversions take place, and if so, what the implications are 
on the accuracy, memory requirements, and computational 
resources of the visualization process. Ideally, such con- 
siderations should take place initially as part of the overall 
strategy for solving a particular problem. (See Section 1.7 
for more information about interfacing meshes to the visu- 
alization system.) 


2.3 Dataset attributes 


Dataset attributes are associated with the structure of the 
dataset, typically with the points and cells. Visualiza- 
tion researchers typically classify visualization techniques 
according to the type of attribute data they operate on, 
as well as the underlying structure of the dataset. The 
general categories of algorithm research are scalar fields, 
vector fields, and tensor fields. A whole host of additional 
algorithms use combinations of these techniques, or use 
methods from computer graphics and computational geome- 
try to create particular types of visualizations. For example, 
glyphing is a data-driven modeling operation as described 
in the following section. 


2.3.1 Scalar fields 


A set of single-valued data values is referred to as a scalar 
field, or simply scalars. Temperature, pressure, density, 
a component of displacement or stress, factor of safety, 
and so on are all examples of scalar values. There is a 
close correspondence between scalars and colors — colors 
being a vector tuple of grayscale, RGB (red-green-blue), or 
RGBA (red-green-blue-alpha with alpha a measure of trans- 
paxency). The relationship between scalars and colors is via 
a lookup table (scalar value indexes into a table of colors) 
or a function known as a transfer function c; = f (s;) that 
maps a scalar value s; into a unique color c;. This rela- 
tionship forms the basis of many visualization techniques 


including the color mapping and volume rendering methods 
described shortly. 


2.3.2 Vector fields 


An n-tuple of values representing a direction and magni- 
tude defines a vector, where n is typically (2, 3) in two- and 
three-dimensional space. Velocity, momentum, displace- 
ment, direction, and gradients form typical vector fields. 


2.3.3 Tensor fields 


Tensors are complex mathematical generalizations of vec- 
tors and matrices. A tensor of rank k can be considered a 
k-dimensional table. A tensor of rank 0 is a scalar, rank 1 
is a vector, and rank 3 is a three-dimensional rectangular 
array. Existing visualization algorithms focus on rank 2 ten- 
sors, that is, 3 x 3 matrices such as strain and stress tensors. 


2.3.4 Graphics attributes 


Because visualization is inherently tied to computer graph- 
ics, attributes related to the rendering of data are also 
important. This includes surface normals and texture coor- 
dinates. Surface normals (a normalized direction vector) are 
used in the shading process to show the effects of lighting as 
we saw previously. Texture mapping is used to apply detail 
to the surface of objects (the texture is generated from the 
data) or to model objects (transparent or translucent textures 
can be used to cut away or window into data). 


2.3.5 Time 


Time is generally used to organize the previously described 
attributes into separate time steps. Visualizations are per- 
formed for each step, and a sequence of steps are arranged 
to form animations. Some algorithms ~ such as streakline 
generation and time domain—based image processing algo- 
rithms — may use time to produce specialized visualizations. 


2.3.6 General fields 


A general collection of data may be referred to as a 
field. For example, the collection of stresses, strains, and 
displacements at nodal points form a field of results from a 
materials analysis. Other information, such as run identifier, 
optimization values, and material properties can be thrown 
into the mix. The point is that many visualization algorithms 
can represent general fields. However, in order to visualize 
the data, the field is winnowed down into-scalars, vectors, 
tensors, or one of the other attributes described previously 
and then visualized, In some cases, fields may be combined 
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to form irregular point sets and visualized using techniques 
appropriate to that class of datasets. 


2.4 Cells and data interpolation 


As previously indicated, the current abstraction for visu- 
alization systems assumes that the dataset is composed of 
cells. These cells are typically linear (lines, triangles, poly- 
gons, tetrahedra) or products of linear functions (quadri- 
lateral, hexahedral). The primary purpose of the cell is to 
provide an interpolating function within the spatial extent 
that the cell encompasses. That is, the cell provides a con- 
tinuous field of values from the discrete set of points at 
which the data field is computed or sampled. Typically, the 
interpolating functions are the standard isoparametric func- 
tions found in finite element analysis. The cell also serves to 
partition the domain into topologically regular subdomains 
over which geometric computation and searching can be 
accelerated. 

Current visualization systems provide limited support for 
cell types. Computational scientists using advanced higher- 
order p-type finite element formulations, or those using 
adaptive meshes with complex types (e.g. octants arbitrarily 
subdivided in an octree) will certainly encounter these lim- 
jtations. The standard recourse in such cases is to subdivide 
the cells into the appropriate linear primitives supported by 
the visualization system. Of course, this subdivision must 
be performed carefully to avoid losing information. Note, 
however, that excessive subdivision may result in excessive 
numbers of primitives resulting in slow interaction rates. 
(Section 1.7 contains more information about interfacing 
meshes to the visualization system.) 

The cell data abstraction is an outcome of several fac- 
tors. Graphics systems are optimized for linear interpolation 
of data values across simple primitives such as polygons; 
ultimately, all data must be transformed into this form if 
interactive visualization is to be used. Second, data tends 
to be discrete — that is, data values are known at fixed 
positions ~ while rendering techniques are designed for 
continuous surfaces. It is particularly important to under- 
stand the effects of interpolation, and to be aware of the 
differences in interpolation function found between the 
computational system and the visualization system. 


3 VISUALIZATION ALGORITHMS 


Visualization algorithms are classified according to the 
attribute and dataset type they treat. The dataset type gen- 
erally impacts the speed and complexity of the algorithms, 
although some algorithms are structure-specific. For exam- 
ple, three-dimensional regular lattice data (e.g. volumes) 


can be readily subsampled into planes and lines of data. 
Such techniques do not exist for irregular data. 


3.1 Scalar fields 


The most used visualization algorithms are those for 
scalar fields. 


3.1.1 Color mapping 


Color mapping transforms scalar values into colors, and 
applies the colors to a surface geometry that is then rendered 
(Figure 3). The colors may be shades of gray (grayscale), 
variation across a particular color hue, an arbitrary color 
function or any of these combined with alpha transparency 
values. The geometry may range from spatial extractions 
of cells and points to surfaces created by extracting the 
boundary of the domain or a particular subset of cells. 
The surface may be arbitrary such as a probe surface or 
cut plane. 

Color mapping is often implemented using a lookup 
table. Given a particular scalar data range (Spin, Smax) and 


Figure 3. Color mapping can produce dramatically different 
results depending on the choice of lookup table (or transfer 
function). Visualization must necessarily consider the human per- 
ceptual system. As a result, the computational scientist must 
carefully consider the effect of perceptual factors on the target 
audience. (Courtesy of Kitware, Inc. Taken from the book The 
Visualization Toolkit An Object-Oriented Approach to 3-D Graph- 
ics Third Edition ISBN-1-930934-07-6.) A color version of this 
image is available at http://www.mrw.interscience.wiley.com/ecm 
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a scalar value s;, a linear lookup table computes an index 
into the table using the function 

3 S; — So: 
index = ——_™ 


max ~ Smin 
where s; is assumed to lie between (Snin Smax)» and if not, is 
clamped to these values. The table color values are defined 
by the user. In many systems, predefined tables or simple 
functions are used to define the color table. 

More complex mappings are also available. Transfer 
functions are continuous versions of lookup tables that map 
a scalar value into a color. Separate transfer functions may 
be available for each component of an RGB color and alpha 
transparency. Volume rendering (see Section 1.4) typically 
uses transfer functions in implementation. 


3.1.2 Isocontours 


One of the most common visualization techniques generates 
isocontours from scalar fields. An isocontour is the line 
(2-D) or surface (3-D) defined by all points taking on a 
particular scalar value (the contour value C) 


FRY 2=C a) 


While there are a multitude of methods for computing iso- 
contours, in general, they are based on edge interpolation. 
An edge intersects an isosurface when the data values asso- 
ciated with its two end points are simultaneously above and 
below a specified contour value. Linear interpolation is then 
used to produce the isosurface intersection point. Various 
methods are then used to connect the intersection points 
into edges, faces, and ultimately, (n-/)-dimensional cells 
(assuming that the dataset is n-dimensional). 

A widely used isocontouring technique is the so-called 
marching cubes algorithm (Lorensen and Cline, 1987). 
Marching cubes is simple to implement, fast, and lends 
itself to parallel implementation. Furthermore, while the 
original algorithm was described on voxels (the cells of a 
regular volume dataset), it is easily extended to other cells 
of varying topology, for example, triangles, quadrilaterals, 
and tetrahedra. The essence of the algorithm is that given a 
particular isocontour value C, the data values on the points 
forming the cell produce a finite number of combinations 
with respect to C. That is, each point may be classified in 
one of two states: (1) its data value may be greater than or 
equal to C, or (2) its data value may be Jess than C. For a 
hexahedron, 256 (28) finite combinations exist. As a result, 
it is possible to create a case table that explicitly represents 
the topology of the resulting triangulation of the cell; that is, 
the number of points, edges, and triangles that approximate 
the isosurface. The algorithm proceeds by determining 


the case of each cell, indexing into the case table, and 
producing the prescribed intersection points and triangles. 
Figure 4 shows a reduced case table that combines the 
various cases — related by symmetry or rotation — into 15 
distinct configurations. Note, however, that in practice the 
full 256-element table is used to insure that the resulting 
isosurface is manifold (i.e. contains no holes). The case 
table includes arbitrary decisions regarding the triangulation 
of hexahedral faces with ambiguous configurations (see 
Figure 5). The full case table insures that the decisions 
are made consistently. Methods are available to better 
estimate what cases to use (Nielsen and Hammam, 1991); 
but ultimately the data is sampled at discrete points and the 
ambiguity cannot be rigorously resolved. 

Duplicate points generated along the same edge from 
neighboring voxels are typically combined using a point 
merging operation. Normals may also be computed from 
interpolation of point gradient values. The point gradient 
values in turn are computed using a central difference 
scheme easily computed from the 3-D lattice structure of 
the volume. 

Because marching cubes is case-table-based, it is both 
fast and robust. However, the algorithm must visit all vox- 
els, most of which do not contain the isosurface. A variety 
of acceleration techniques such as span space (Livnat, Shen 
and Johnson, 1996) can be used to insure that only those 
cells containing isosurface are visited. The interval tree 
keeps track of cells according to min/max scalar values. If a 
particular isocontour value falls within a particular min/max 
range of a cell, then that cell must be processed. Empty cells 
are ignored, 

Marching cubes is easily extended to other cell types, 
including irregular cell types such as tetrahedra. The pri- 
mary difference is the manner in which point normals 
are computed. In irregular data forms, the computation of 
point normals is usually deferred until after the isosurface 
is created. Normals can then be computed from the sur- 
face directly. 


3.1.3 Carpet plots 


Displacement of a surface as a function of scalar value is 
known as a carpet plot. For example, topographic maps 
displaced in the z-direction as a function of elevation 
create two and a half dimensional terrain maps. A similar 
approach is used to create carpet plots of pressure across an 
airplane wing, for example. The magnitude of displacement 
is further controlled using a scale factor to prevent excessive 
or imperceptible displacement. The same technique can be 
used for an arbitrary surface. Note that the displacement 
direction vector — which is typically taken as the surface 
normal — varies rapidly on highly curved surfaces. The 


Case 12 Case 13 
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Case 14 


Figure 4. Marching cubes case table. In practice, all 256 possible cases are used to insure that the surface remains manifold. 


OY cer 


(a) Break contour (b) Join contour 


Figure 5. Choosing a particular contour case will break (a) or 
join (b) the current contour. The case table must be designed to 
insure that this decision is made consistently across all cases. 


resulting carpet plot can be confusing in such situations. 
The plot surface may even self-intersect in some cases. 


3.2 Vector fields 


Vector fields are also widely found in the computational sci- 
ences, Vector displays show vector magnitude or direction, 
or both. 


3.2.1 Vector glyphs 


Probably, the most common vector displays are those 
based on glyphs. Typically, a glyph (such as a small 


arrow) is oriented in the direction of the local vec- 
tor, and may be colored or scaled to indicate magnitude 
of the vector. Glyphed vector fields are also referred 
to as hedgehogs because of their resemblance to the 
small spiked animals. Care must be used when scal- 
ing glyphs since uniform scaling of 3-D objects yields 
O(n?) change in surface area and O(n) change in vol- 
ume. These effects may mislead the consumer of the 
visualization. 


3.2.2 Displacement maps 


Another approach converts vectors to scalar values, and 
then uses methods of scalar visualization to view the 
result. For example, surface vectors can be converted to 
scalars by computing the dot product between the vector 
and the local surface normal. Color mapping can then 
be used to shade the surface to produce a displacement 
map. Figure 6 shows the result of this technique applied 
to a beam in vibration. The visualization clearly indicates 
the regions of positive and negative displacement, and 
the nodal lines (lines of zero displacement) between the 
regions. 
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Figure 6. A displacement map of a beam in vibration. (Courtesy 
of Kitware, Inc. Taken from the book The Visualization Toolkit An 
Object-Oriented Approach to 3-D Graphics Third Edition ISBN- 
1-930934-07-6.) A color version of this image is available at 
http://www.mrw.interscience.wiley.com/ecm 


3.2.3 Vector displacement plots 


Similar to scalar carpet plots, vector displacement plots are 
used to illustrate vector displacement. The original dataset 
geometry is deformed using the vector field, usually in 
combination with a scale factor to control the amount of dis- 
tortion. These plots are particularly effective when rapidly 
animated by varying the scale factor using a sinusoidal or 
saw tooth function. 


3.2.4 Particle advection 


The structure of the vector field can be explored using 
interactive techniques such as particle advection. This is 
an extension of the hedgehog technique, where a point is 
moved a small distance over a time period dt. In other 
words, if velocity V = dx/dr, then the displacement of a 
point is 


dx = Vdt (2) 


z W position 


O 
O O one, 


This suggests an extension to our previous techniques: 
repeatedly displace points over many time steps. Figure 7 
shows such an approach. Beginning with a sphere S cen- 
tered about some point C, S is repeatedly moved to generate 
the bubbles shown. The eye tends to trace out a path by 
connecting the bubbles, giving the observer a qualitative 
understanding of the fluid flow in that area. The bubbles 
may be displayed as an animation over time (giving the 
illusion of motion) or as a multiple exposure sequence 
(giving the appearance of a path). This is referred to as 
particle advection. 

Such an approach can be misused. For one thing, the 
velocity at a point is instantaneous. Once we move away 
from the point, the velocity is likely to change. Equation (2) 
above assumes that the velocity is constant over the entire 
step. By taking large steps, we are likely to jump over 
changes in the velocity. Using smaller steps, we will end 
in a different position. Thus, the choice of step size is a 
critical parameter in constructing accurate visualization of 
particle paths in a vector field. 

To evaluate equation (2), we can express it as an integral: 


= f Par 6) 


Although this form cannot be solved analytically for most 
real world data, its solution can be approximated using 
numerical integration techniques. Accurate numerical inte- 
gration is a topic beyond the scope of this book, but it is 
known that the accuracy of the integration is a function of 
the step size dr. The simplest form of numerical integration 
is Euler’s method, 


Ky =i + V,At (4) 


where the position at time X,,, is the vector sum of the 
previous position plus the instantaneous velocity times the 
incremental time step At. i 

Euler’s method has error on the order of O(A1?), which 
is not accurate enough for many applications. One such 
example is shown in Figure 8. The velocity field describes 
perfect rotation about a central point. Using Euler’s method, 
we find that we will always diverge, and instead of gener- 
ating circles, will generate spirals instead. 

A better approach is the Runge-Kutta technique of 
second order (Conte and de Boor, 1972). This is given by 


or O O 
Instantaneous velocity O 
Final position 


Figure 7. Time animation of a point C. Although the spacing between points varies, the time increment between each point is constant. 


(a) Rotationa! vector field (b) Euler's method (c) 
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Runge-Kutta 


Figure 8. Euler’s integration (b) and Runge-Kutta intcgration of order 2 (c) applied to uniform rotational vector field (a). Euler’s 


method will always diverge. 
the expression 


3 a AP ss 

ig =X + a! PEV 65) 
where the velocity V, +1 is computed using Euler’s method. 
The error of this method is O(At?). Compared to 
Euler’s method, the Runge-Kutta technique enables larger 
integration step at the expense of one additional function 
evaluation. 


3.2.5 Numerical integration 


Integration formulas require repeated transformation from 
world coordinates to local cell coordinates and as a result 
are computationally demanding. The local vector field 
is computed via cell interpolation (in local coordinates) 
whereas the spatial displacement is computed in global 
coordinates. There are two important steps we can take to 
improve performance. 


1. Improving search procedures. There are two distinct 
types of searches. Initially, the starting location of 
the particle (i.e. to find what cell contains the point) 
must be established using a global search procedure. 
Once the initial location of the point is determined, an 
incremental search procedure can be used. Incremental 
searching is efficient because the motion of the point 
is limited within a single cell, or at most across a cell 
boundary. Thus, the search space is greatly reduced, 
and the incremental search is faster relative to the 
global search. 

2. Coordinate transformation. The cost of a coordinate 
transformation from global to local coordinates can be 
reduced if either of the following conditions is true: the 
local and global coordinate systems are identical with 
one another (or vary by a simply rigid body transform), 
or if the vector field can be transformed from global 
space to local coordinate space. The image data coor- 
dinate system is an example of local coordinates that 


are parallel to global coordinates, hence global to local 
coordinate transformation can be greatly accelerated. 
If the vector field is transformed into local coordi- 
nates (either as a preprocessing step or on a cell-by-cell 
basis), then the integration can proceed completely in 
local space. Once the integration path is computed, 
selected points along the path can be transformed into 
global space for the sake of visualization. 


3.2.6 Particle traces, streamlines, and streaklines 


A natural extension to the methods of the previous section 
is to connect the point position X(t) over many time steps. 
The result is a numerical approximation to a particle trace 
represented as a line. Depending on whether the flow is 
steady or time varying, we can define three related line 
representation schemes for vector fields. 


e Particle traces are trajectories traced by fluid particles 
over time. 

e Streaklines are the set of particle traces at a particular 
time ¢, that have previously passed through a specified 
point x;. 

e Streamlines are integral curves along a curve s satis- 
fying the equation 


s= ftos, with s = s(x, È) (6) 
t 


for a particular time ?. 


Streamlines, streaklines, and particle traces are equiva- 
lent to one another if the flow is steady. In time-varying 
flow, a given streamline exists only at one moment in time. 
Visualization systems generally provide facilities to com- 
pute particle traces. However, if time is fixed, the same 
facility can be used to compute streamlines. In general, this 
visualization algorithm is referred to as streamline genera- 
tion, but it is important to understand the differences when 
the vector field is time varying. 
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3.2.7 Advanced methods 


The vector field integration techniques described previously 
lend themselves to a variety of related methods. A simple 
extension to the streamline is to wrap the line with a 
tube. The tube radius may vary according to mass flow 
(Schroeder, Volpe and Lorensen, 1991). That is, assuming 
incompressible flow with no shear, the radius of the tube 
can vary according to the scalar function vector magnitude. 
Then the equation 


7B) = Fraga east! (1) 
v 


relates an area of constant mass flow, where the radius of 
the tube at any point r (Ù) is a function of the maximum 
radius Fmax and minimum velocity along the tube Dyin. 

Another common streamline technique widens the line to 
create a ribbon or surface. One method to create a stream- 
surface generates adjacent streamlines and then bridges the 
lines with a ruled surface. This technique works well as long 
as the streamlines remain relatively close to one another. If 
separation occurs, so that the streamlines diverge, the result- 
ing surface will not accurately represent the flow, because 
we expect the surface to be everywhere tangential to the 
yector field (i.e. definition of streamline). The ruled sur- 
face connecting two widely separated streamlines does not 
generally satisfy this requirement; thus the surface must 
adaptively adjust to local flow conditions. 

A streamribbon can also be calculated by attaching a 
ribbon to the streamline and rotating it with the local 
streamwise vorticity. Vorticity © is the measure of rota- 
tion of the vector field, expressed as a vector quantity: a 
direction (axis of rotation) and magnitude (amount of rota- 
tion). Streamwise vorticity & is the projection of & along 
the instantaneous velocity vector, 3. Said in another way, 
streamwise vorticity is the rotation of the vector field around 
the streamline defined as follows. 


(8) 


The amount of twisting of the streamribbon approximates 
the streamwise vorticity. 

A streamsurface is a collection of an infinite number of 
streamlines passing through a base curve. The base curve, 
or rake, defines the starting points for the streamlines. If the 
base curve is closed (e.g. a circle), the surface is closed and 
a streamtube results. Thus, streamribbons are specialized 
types of streamsurfaces with a narrow width compared 
to length. 

Compared to vector icons or streamlines, streamsurfaces 
provide additional information about the structure of the 


vector field. Any point on the streamsurface is tangent to the 
velocity vector. Consequently, taking an example from fluid 
flow, no fluid can pass through the surface. Streamtubes are 
then representations of constant mass flux. Streamsurfaces 
show vector field structure better than streamlines or vector 
glyphs because they do not require visual interpolation 
across icons. 

Streamsurfaces can be computed by generating a set of 
streamlines from a user-specified rake. A polygonal mesh 
is then constructed by connecting adjacent streamlines. 
One difficulty with this approach is that local vector field 
divergence can cause streamlines to separate. Separation 
can introduce large errors into the surface, or possibly cause 
self-intersection, which is not physically possible. 


3.2.8 Vector field topology 


Vector fields have a complex structure characterized by 
special features called critical points (Globus, Levit and 
Lasinski, 1991; Helman and Hesselink, 1991). Critical 
points are locations in the vector field where the local vector 
magnitude goes to zero and the vector direction becomes 
undefined. At these points, the vector field either converges 
or diverges, and/or local circulation around the point occurs 
(Figure 9). 

A number of visualization techniques have been devel- 
oped to construct vector field topology from an analysis 
of critical points. These techniques provide a global under- 
standing of the field, including points of attachment and 
detachment and field vortices. Using a fluid fiow analogy, 
points of attachment and detachment occur on the surface of 
an object where the tangential component of the vector field 
goes to zero, and the flow is perpendicular to the surface. 
Thus, streamlines will begin or end at these points. There is 
no common definition for a vortex, but generally speaking, 
vortices are regions of relatively concentrated vorticity (e.g. 
flow rotation). The study of vortices is important because 
they represent areas of energy loss, or can have significant 
impact on downstream flow conditions (e.g. trailing vortices 
behind large aircraft). 

One useful visualization technique creates vector field 
skeletons that divide the vector field into separate regions. 
Within each region, the vector field is topologically equiva- 
lent to uniform flow. These skeletons are created by locating 
critical points, and then connecting the critical points with 
streamlines. In 3-D vector field analysis, this technique can 
be applied to the surface of objects to locate lines of flow 
separation and attachment and other important flow fea- 
tures. Also, in general 3-D flow, the regions of uniform 
flow are separated by surfaces, and creation of 3-D flow 
skeletons is a current research topic. 
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Figure 9. Critical points in two dimensions. The real part of the eigenvalues (R1, R2) of the matrix of first derivatives control the 
attraction or repulsion of the vector field. The imaginary part of the eigenvalues (J,, I2) controls the rotation. 


3.3 Tensor fields 


Tensor visualization is an active area of research. There 
are few techniques for tensor visualization other than 3 x 3 
real symmetric tensors. Such tensors are used to describe 
the state of strain or stress in a 3-D material. The well- 
known stress and strain tensors for an elastic material are 
shown in Figure 10. 

Recall that a 3 x 3 real symmetric matrix can be charac- 
terized by the three eigenvectors and three eigenvalues of 
the matrix. The eigenvectors form a 3-D coordinate system 
whose axes are mutually perpendicular. In some applica- 
tions, particularly the study of materials, these axes also 
are referred to as the principal axes of the tensor and are 
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Strain tensor 


{a) Stress tensor (b) 


Figure 10. Stress and strain tensors. Normal stresses in the x- 
y-z coordinate directions indicated as ox, Gy, Oz, Shear stresses 
indicated as t;;. Material displacement represented by u, v, w 
components. 


physically significant (e.g. directions of normal stress and 
no shear stress). Eigenvalues are physically significant as 
well. In the study of vibration, eigenvalues correspond to 
the resonant frequencies of a structure, and the eigenvectors 
are the associated mode shapes. 
We can express the eigenvectors of the 3 x 3 system as 
Ù = Ng, with i =1,2,3 (9) 
with é,, a unit vector in the direction of the eigenvalue, and 
»;, the eigenvalues of the system. If we order eigenvalues 
such that 


dy Oy By (10) 


then we refer to the corresponding eigenvectors 3, Ù, and 
ú, as the major, medium, and minor eigenvectors. 


3.3.1 Tensor ellipsoids 


This leads us to the tensor ellipsoid technique for the 
visualization of real, symmetric 3 x 3 matrices, The first 
step is to extract eigenvalues and eigenvectors as described 
in the previous section. Since eigenvectors are known to be 
orthogonal, the eigenvectors form a local coordinate system. 
These axes can be taken as the minor, medium, and major 
axes of an ellipsoid. Thus, the shape and orientation of 
the ellipsoid represent the relative size of the eigenvalues 
and the orientation of the eigenvectors. In Figure 11 we 
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Figure 11. Tensor ellipsoids. (Courtesy of Kitware, Inc. Taken 
from the book The Visualization Toolkit An Object-Oriented 
Approach to 3-D Graphics Third Edition ISBN-1-930934-07-6.) 
A color version of this image is available at http://www mrw. 
interscience.wiley.com/ecm 


visualize the analytical results of Boussinesq’s problem 
from Saada. Note that tensor ellipsoids and tensor axes are 
a form of glyph specialized to tensor visualization. 


3.3.2 Hyperstreamlines 


Hyperstreamlines are constructed by creating a streamline 
through one of the three eigenfields, and then sweeping 
a geometric primitive along the streamline (Delmarcelle 
and Hesselink, 1993). Typically, an ellipse is used as the 
geometric primitive, where the remaining two eigenvectors 
define the major and minor axes of the ellipse (Figure 12). 
Sweeping the ellipse along the eigenfield streamline results 


Ma 


(a) Ellipse definition (b) 


in a tubular shape. Another useful generating geometric 
primitive is a cross. The length and orientation of the arms 
of the cross are controlled by two of the eigenvectors. 
Sweeping the cross results in a helical shape since the 
eigenvectors (and therefore the arms of the cross) will 
typically rotate in the tensor field. Figure 13 shows an 
example of hyperstreamlines. The data is from a point load 
applied to a semiinfinite domain. Compare this figure to 
Figure 11 that used tensor ellipsoids to visualize the same 
data, Notice that there is less clutter and more information 
available from the hyperstreamline visualization. 


3.4 Extraction and modelling 


Visualization inherently involves modeling operations and 
the extraction of subregions of data. For example, an analyst 
may wish to extract a portion of the data whose scalar 
values lie within a particular scalar range (i.e. threshold 
the data). Or, operations that clip and cut may provide a 
view into the interior of a complex 3-D dataset. Extraction 
and clipping is generally performed to limit the data to a 
region of interest, either to gain interactive performance 
from large data sets, or to eliminate visual distraction from 
less important regions of the data. 

Data extraction and modeling operations often transform 
the structure of the data on which they operate. For exam- 
ple, if all (voxel) cells from a regular volumetric dataset 
and contained within an ellipsoidal region arc extracted, 
the resulting structure is not regular, and therefore, not a 
volume. Alternatively, extracting a region of interest (ROD 
from a volume will result in another regular volume. Oper- 
ations that modify the structure of data may have a dramatic 
impact on the performance and computational requirements 
of a visualization process. 


3.4.1 Geometric extraction and implicit functions 


Geometric extraction produces cells and/or points that lie 
within a domain — typically a spatial domain. The test for 


Ellipse swept to create tube 


Figure 12. Creation of hyperstreamlines. An ellipse is swept along a streamline of the eigenfield. Major/minor axes of the ellipse are 


controlled by the other two eigenvectors. 


Figure 13. Example of hyperstreamlines. The four hyperstream- 
lines shown are integrated along the minor principal stress axis. 
A plane (colored with a different lookup table) is also shown. 
(Courtesy of Kitware, Inc. Taken from the book The Visualiza- 
tion Toolkit An Object-Oriented Approach to 3-D Graphics Third 
Edition ISBN-1-930934-07-6.) A color version of this image is 
available at http://www.mrw.interscience.wiley.com/ecm 


inclusion may include cells that are completely within the 
domain, or cells with one or more points lying within the 
domain (i.e. partial inclusion). The domain can be defined 
in a number of ways, including using bounding boxes 
or combinations of clipping planes. In particular, the use 
of implicit functions — including boolean combinations of 
implicit functions — is a powerful, simple way to define 
complex domains. An implicit function has the form 


F(@) =0 an 


where F(3") <0 is inside and F(X) > 0 is outside the 
implicit function. The family of implicit functions includes 
planes, spheres, cones, ellipsoids, and a variety of other 
simple shapes. Using boolean combinations — union, inter- 
section, and difference — it is possible to create complex 
domain definitions using these simple shapes. 

Implicit functions have the property that they convert a 
position © into a scalar value s via equation (11). Thus, 
any scalar technique described previously (e.g. isocon- 
touring) can be used in combination with implicit func- 
tions. This forms the basis of several techniques such as 
thresholding, cutting, and clipping described in the follow- 
ing sections. 
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3.4.2 Thresholding 


Thresholding is a technique to extract subregions of data 
based on attribute values. For example, we may wish to 
extract all cells whose scalar values fall within a specified 
range. Generally, thresholding is performed using scalar 
values because any attribute type can be converted into a 
single value using the appropriate evaluation function (e.g. 
vector dot product, etc.). In the previous section, we saw 
how implicit functions can be used to convert a position = 
into a scalar value. i 

There are several ways in which to implement the thresh- 
olding operation. One is to geometrically extract cells sat- 
isfying the threshold into a new, output dataset. This has 
the benefit of reducing data size but may convert a reg- 
ular dataset into an irregular one as described previously. 
Another approach is to create a ‘blanking’ array to indicate 
which cells and/or points are visible. During the render- 
ing process, only the visible entities are actually drawn. 
The benefit to this approach is that the regularity of the 
data is preserved at the expense of the additional memory. 
Finally, another interesting approach derived from com- 
puter graphics is to use transparent textures to eliminate 
invisible data, That is, a special texture map consisting 
of a transparent region and an opaque region is used in 
combination with texture coordinates computed from the 
thresholding operation. The benefit of the texture approach 
is that the structure of the dataset is not modified and the 
thresholding function can be changed rapidly by modifying 
the relatively small texture map. Furthermore, since mod- 
ern graphics hardware supports texture maps, it is possible 
to produce interactive thresholding on large datasets with 
this approach. 


3.4.3 Topological extraction 


Topological extraction selects cells and points based on 
topological criterion such as id or adjacency to a par- 
ticular feature such as an edge or face, Dataset bound- 
aries of manifold 3-D datasets can be determined by 
extracting all faces that are used by one cell (in the 
sense that a face is used when it forms the bound- 
ary of a 3-D cell). Often, topological and geometric 
operations are combined — sharp ‘feature’ edges (deter- 
mined by a geometric operation comparing surface nor- 
mals on faces using an edge) and the faces connected 
to these edges (a topological adjacency operation) can 
be displayed to examine the qualities of a computational 
mesh. Other common operations include selecting a point 
and displaying all edges, faces, and/or cells using that 
point. 
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3.4.4 Probing and spatial location 


Probing is used to transform structure, reduce data size, 
reduce topological dimension, and focus on particular fea- 
tures of a dataset. The fundamental probing operation deter- 
mines the data values (e.g, attribute values) at a particular 
point in a dataset. Hence, this sampling operation involves 
two distinct steps: locating the point within the structure 
of the dataset (e.g. determining which cell contains the 
probe point) and interpolating data values from the contain- 
ing cell. This operation inherently reduces the topological 
dimension of a dataset down to that of a zero-dimensional 
point. However, by arranging a set of points in a particular 
structure, it is possible to produce new datasets of vary- 
ing dimension and structure. Furthermore, if the number of 
points in the probe is less than that of the original dataset, 
data reduction is performed. 

Typical examples of probing include determining data 
values at a point of interest (i.e. producing a numerical 
value), producing x-y plots along a line or curve, producing 
surface plots, color maps, or images from a regular array 
of points, and converting irregular unstructured data into 
regular volumetric data (for the purposes of volume ren- 
dering). Like any sampling function, probing must be used 
carefully to avoid undersampling and oversampling data. 
Undersampling may produce incorrect visualizations that 
miss important data features; oversampling may produce 
the illusion of accuracy and require excessive computa- 
tional resources. 


3.4.5 Cutting 


The cutting operation reduces an n-dimensional dataset into 
an (n — 1)-dimensional surface (the cutting surface), For 
example, a 3-D dataset may be cut by a plane to produce a 
2-D surface on which the original data attributes have been 
interpolated. 

Cutting operations can be conveniently implemented 
using isocontouring methods in conjunction with implicit 
functions. The implicit function is used to describe the 
cutting surface. During the cutting operation, each point 
of each cell is evaluated through the implicit function to 
produce values that are above, below, and equal to zero. 
An isosurface of value F(X) =0 is then extracted to 
produce the cutting surface. Isocontouring and cutting are 
equivalent in the sense that both are controlled by scalar 
values. The difference is that the cutting operation pro- 
duces scalar values by evaluating point locations through 
an implicit function. 

One of the advantages of the cutting operation as com- 
pared to probing is that the resolution of the cut surface 
is directly related to the resolution of the underlying data 
through which the cut surface passes. 


3.4.6 Clipping 


The clipping operation produces an n-dimensional dataset 
from an (n — 1)-dimensional clipping surface applied to an 
n-dimensional input dataset. The structure of the data is 
often transformed by this operation. For example, clipping 
a 3-D regular volume with a plane produces an irregular 
output that will consist of tetrahedra. Similar to cutting 
operations, clipping surfaces are often described by implicit 
functions. However, data values (i.e. scalar values) can also 
be used to define the clipping surface. 

Clipping is relatively easy to produce in cells of dimen- 
sion two or less. Case tables similar to that of the marching 
cubes isocontouring algorithm are designed to produce lin- 
ear cell types, such as lines, triangles, and quadrilaterals. 
In dimensions three and higher, it is difficult to design case 
tables that produce consistent meshes with cell types closed 
under the set tetrahedron, hexahedron, pyramid, and wedge. 
For example, while tetrahedra can be clipped to produce 
new tetrahedra and wedges, hexahedra require careful prete- 
trahedrization to produce compatible meshes. (Consistent 
meshes are those that satisfy the compatibility conditions 
described by Luebke et al. (2003) — no T-junctions or 
free faces.) 


3.4.7 Glyphing 


Glyphing is a general-purpose visualization technique that 
can be used in endless variety. We have already seen how 
glyphs are used to produce vector hedgehogs and tensor 
ellipsoids. Glyphing is essentially a data-driven modeling 
operation. That is, a canonical geometric/topological struc- 
ture is defined, and the structure is modified according to 
data values at a particular location. The location is typ- 
ically a point in the dataset, but glyphs may be used to 
represent cells, datasets, or even the relationships between 
data. Figure 14 shows superquadric glyphs used to indicate 
position in the x-y plane. In a famous technique, Chernoff 
(1973) used the human face as a glyph, and linked data 
value to the features of the face. The key to glyph design is 
to use a representation that naturally conveys information 
to the viewer. This is inherently problem and application 
specific, Another issue with glyphs is that the use of too 
many clutters the display. 


3.4.8 Other operations 


A variety of other modeling operations are used in visual- 
ization. Many of these techniques are used to highlight data 
or provide various forms of visual annotation. Points and 
line primitives (streamlines, edges) are often thickened by 
placing spheres at points (another simple use of glyphing) 
or wrapping lines with tubes. Data may be framed or set 


Figure 14. Glyphs used to indicate x-y location in the plane. 
Superquadrics are modified according to coordinate value. (Cour- 
tesy of Kitware, Inc. Taken from the book The Visualization 
Toolkit An Object-Oriented Approach to 3-D Graphics Third Edi- 
tion ISBN-1-930934-07-6.) A color version of this image is avail- 
able at http://www.mrw.interscience.wiley.com/ecm 


off by employing frames or base planes. Shadows are often 
used to provide depth cues. In some cases, extrusion is used 
to turn lower-dimensional data into more visually pleasing 
3-D forms (carpet plots are one example of this). 

Several computational geometry techniques are in com- 
mon usage. Decimation is the process of reducing a mesh 
to a smaller size, preserving as much as possible the accu- 
racy of the original mesh (Schroeder, Zarge and Lorensen, 
1992). This is particularly important where large data size 
prevents interactive performance (see Section 1.5), Mesh 
smoothing using Laplacian vertex placement (Taubin, 1995) 
or windowed sinc functions (Taubin, Zhang and Golub, 
1996) are used to smooth out surface noise. For example, 
3-D isocontours produced from medical data often reflect 
aliasing due to sampling or noise inherent to the imag- 
ing modality. Smoothing can effectively reduce the high- 
frequency noise characteristic of these conditions. Another 
common class of operations is that based on connectiv- 
ity. Region growing in images — especially when coupled 
with statistical measures such as mean and variance — can 
effectively segment data of interest. Geometric connectiv- 
ity is useful for extracting meshes and leaving behind small 
artifacts due to noise. 


4 VOLUME RENDERING 


Early computer graphics focused on surface rendering tech- 
niques. Ray tracing (Whitted, 1980) was and still is a 
popular technique based on tracking the interaction of 
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light rays with the surfaces in a scene (the surface typ- 
ically being defined with implicit functions, collections 
of linear primitives, or splines). Indeed, current graphics 
techniques remain surface oriented — polygons are used 
to represent surfaces (even splines such as NURBS are 
tessellated into triangles) that are then processed by graph- 
ics hardware. However, visualization datasets are typically 
three-dimensional in nature and carry important informa- 
tion interior to the boundary. This has given rise to volume 
rendering techniques (Figure 15). 


4.1 Overview 


The basic idea behind volume rendering is simple: for each 
pixel in the image, information interior to the data set 
(lying under the pixel) is composited in an ordered fashion 
by applying a transfer function to the data. The transfer 
function — similar to a lookup table — is a mapping of 
data value to color and transparency (i.e. aipha value). The 
transfer function may consist of a simple set of piecewise 
continuous linear ramps for each of the red, green, blue, 
and alpha (RGBA) components, or it may be a complex 
segmentation and include the effect of gradients or other 
features of the data. Most often volume rendering is applied 
to a 3-D image datasets (i.e. volumes), but recent techniques 
have extended volume rendering to irregular data. In some 
cases, a resampling or probing operation is applied to 
irregular data to produce a regular volumetric dataset. This 
has the advantage of rendering speed at the potential cost 
of missing important features in the data. 

Early volume-rendering techniques were based on pure 
software implementations and were slow (several minutes 
per image). Using parallel methods and hardware accelera- 
tion, rendering speeds have been improved to tens of frames 
per second. Two basic approaches to volume rendering are 
common: image-order and object-order methods. 


Figure 15. Volume rendering. (image courtesy of VolView vol- 
ume rendering system from Kitware, Inc.) A color version of this 
image is available at http://www.mrw.interscience.wiley.com/ecm 
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4.2 Image-order methods 


Early versions of volume rendering used a ray-casting 
method. In this approach, rays are directed from the 
observer through each pixel in the output image (hence 
the name image-order) and into the 3-D data. Each ray is 
sampled in the order it propagates through the data, and 
at each sample point the data value is mapped through the 
transfer function to generate an RGBA color tuple that is 
blended with the previous color. Once the opacity reaches 
1.0 (opaque), the ray is terminated; or if the ray teaches 
the far clipping plane, it is composited with the back- 
ground color. 

Ray-casting methods tend to be slower than the object- 
order methods described in the next section. However, 
ray-casting offers great flexibility and accuracy since the 
sampling rate and transfer function are easily adjusted based 
on characteristics of the data. The method is easily paral- 
lelized by either assigning different processors to different 
regions of the output image, or by dividing the data into 
pieces, rendering each piece separately, and compositing to 
form the output image. One important issue is the way in 
which sample points are interpolated from the surrounding 
data. The best results occur using tri-linear interpolation 
from the eight voxel values surrounding any given sam- 
ple. A faster approach used nearest-neighbor interpolation, 
which tends to produce heavily aliased results. 


4.3 Object-order methods 


While image-order methods start from the pixels in the out- 
put image, object-order methods start with the data (the 
object) and project it onto the view plane. Typically, data 
samples are projected onto the image plane in the form 
of an extended splat. In parallel (orthographic) projection, 
the splat is taken as a Gaussian kemel. In perspective 
projection, the splat is approximated with an ellipsoidal 
distribution. The size of the splat must be carefully con- 
trolled — too large a splat and the image is blurry, too small 
and gaps between adjacent splats appear. 

Object-order methods are becoming popular because of 
the speed advantage they offer. Advances in graphics hard- 
ware — especially those relating to 2-D and 3-D texture — 
provide inexpensive, fast volume rendering solutions. Two- 
dimensional texture-based methods sample the volume with 
a series of planes, each plane having a texture mapped 
onto it (the texture includes transparency). If the planes 
are orthogonal to the x-y-z axes, artifacts are apparent in 
off-axis view directions. A better approach is to generate a 
series of planes perpendicular to the view direction. To pro- 
duce the final image, the planes are rendered in order (e.g. 


back to front) and each plane is blended into the current 
image. Emerging 3-D texture mapping provides a similar 
capability except that the interpolation of the texture onto 
the planes is performed in the graphics hardware. 


5 METHODS IN LARGE DATA 
VISUALIZATION 


A major goal of visualization is to assist the user in 
understanding large and complex data sets. However, as 
the size of analyses grows, it becomes more difficult to 
realize this goal, especially if what one desires is interactive 
data exploration. Several approaches are in use today as 
described in the following. However, this topic remains an 
area of active research and new methods are constantly 
being developed. 


5.1 Culling 


Probably the simplest approach to treating large data is 
to avoid processing/rendering data that is not visible. For 
example, a zoomed view of a scene may contain only a 
small fraction of the total data. Often, data is occluded 
by other surfaces. In these and other situations, culling 
algorithms can be used to render only those data that are 
visible. Such algorithms are typically based on bounding 
box, oriented bounding box, or bounding sphere culling. In 
some applications, the z-buffer may be used to eliminate 
data that is not visible. Visibility preprocessing can be 
used when the observer’s viewpoint is known a priori. For 
example, views inside of buildings or other structures where 
the traversal path is known, and where regions are naturally 
separated by rooms or similar features may lend themselves 
to culling. 


5.2 LOD 


Level-of-detail methods are common in computer graphics. 
The idea is to replace a detailed representation with one 
or more coarser representations. Depending on the distance 
from the viewpoint, the coarsening scheme used, and the 
desired interaction rate, one of these levels is selected 
and used in place of the original data. In ideal cases, 
the resulting image fidelity remains essentially unchanged 
with the benefit of greatly improved interactivity. Even in 
situations where the coarsening is obvious,.the ability to 
navigate (or position a camera) in an interactive setting is 
greatly enhanced. 


5.3 Multiresolution 


A natural extension of discrete level-of-detail methods is to 
adjust the resolution of the representation in a continuous, 
or nearly continuous fashion. Examples include octree- 
based methods that adjust resolution based on dynamic 
error measures (i.e. typically based on screen error or global 
geometric error). Several isocontouring methods based on 
octree decompositions have been described in the literature 
(Wilhelms and Van Gelder, 1992). Progressive meshes — 
based on a series of ordered edge-collapses — are used to 
simplify triangular (Hoppe, 1996) and tetrahedral meshes. 
Such meshes can be reconstituted from coarser to finer 
levels by a sequence of edge-based geomorph operations. 
This is one of a large number of algorithnis used to reduce 
large meshes or terrain height fields (Luebke et al., 2003). 
Structured data has the advantage that techniques in signal 
processing can be readily adapted to compress and transmit 
data including wavelet decompositions and feature detec- 
tion (Machiraju et al., 2000). Visualization systems may 
also take advantage of inherent data representations found 
in adaptive multiresolution grids (Berger and Oliger, 1984). 


5.4 Out-of-core methods 


A simple but highly effective approach to large data visu- 
alization is to process data out of main memory in a 
preprocessing step to produce output data that can be 
readily visualized using standard techniques. Thus, a three- 
dimensional data set of memory size O (n?) can be reduced 
in size to O(n?) or O(n!) as the data is transformed 
from its raw form to a visual representation. For exam- 
ple, a 3-D dataset can be isosurfaced to produce a sur- 
face; or streamlines may be produced to produce a 1- 
D polyline. Isocontouring (Chiang, Silva and Schroeder, 
1998), streamlines (Ueng, Sikorski and Ma, 1997), vortex 
cores (Kenwright, 1998), and flow separation/attachment 
lines (Kenwright and Haimes, 1997) are examples of such 
methods. 


5.5 Parallel methods and data streaming 


In conjunction with algorithmic approaches, system archi- 
tecture can be designed to accommodate the demands of 
large data visualization. Certainly, parallel methods are 
important, utilizing approaches based on shared memory 
or distributed processing, Similar to those used for sys- 
tem solution, parallel methods for visualization must be 
carefully designed to minimize communication between 
processes. Visualization systems introduce additional con- 
Straints into the mix due to requirements on rendering and 
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interaction. For example, large tiled display driven by mul- 
tiple processors must be synchronized and data carefully 
distributed to each processor. Depending on the parallel 
rendering technique used, data may be processed on an 
image-order or object-order basis (with meaning similar to 
the previous discussion of volume rendering). In object- 
order technique, the appropriate rendering primitives are 
dispersed to the processor responsible for a given tile. In 
image-order techniques, each processor produces an inter- 
mediate image, which is then composited to form the 
final image. 

Another important technique is to stream data through 
the visualization pipeline in several pieces (Law etal., 
1999). Each piece is processed (with appropriate bound- 
ary conditions) and the data is assembled at the con- 
clusion of processing. The assembly may occur during 
rendering (e.g. in a tiled display device) or using a com- 
positing or append operation. Another advantage of this 
approach is that the user does not need to depend on 
system paging and may operate in pieces that fit in 
main memory for a significant improvement in elapsed 
time. 


6 TAXONOMY FOR DATA 
VISUALIZATION SYSTEMS 


Visualization systems can be categorized into one of three 
types of software systems: toolkits, development envi- 
ronments, and applications. This section provides a brief 
overview of their capabilities. Note that some of these 
systems provide multiple levels of functionality and may 
provide toolkit capabilities along with the features of an 
application or development environment. Furthermore, soft- 
ware is evolving rapidly and the systems described here 
represent only a small number of potential systems. 

A distinguishing feature of many visualization systems 
is that they are architected around the concept of data 
flow, Data flow can be viewed as a series or operations 
on, or transformations of data. Ultimately, the results of 
analysis must be processed into graphics primitives that are 
then displayed by the graphics system. Furthermore, since 
visualization often involves interactive exploration, visual- 
ization systems must be flexible enough to map data from 
one data form to another. The data flow pipeline (or visual- 
ization pipeline) provides a simple abstraction that supporis 
plugging in new process objects (algorithms) — often in 
complex combination — to produce new visualizations or 
transformations of existing data. This abstraction is natural 
to most users, is fiexible, and can be readily codified into 
environments that support the construction of complex data 
processing networks from interlocking components. 
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6.1 Toolkits 


Toolkits are collections of software modules that are used 
by developers to build development environments or appli- 
cations. Software libraries — usually FORTRAN or C- 
based — provide procedural interfaces to visualization and 
graphics functionality. Software libraries have been gener- 
ally replaced with object-oriented toolkits supporting multi- 
ple data representations (i.e. data objects) that are operated 
on by algorithms (i.e. process objects). The libraries in gen- 
eral include callback mechanisms and interfaces to popular 
GUI-building packages such as X, Motif, Windows, Tk, Qt, 
FLTK, and wxWindows. Toolkits include a wide variety of 
functionality ranging from IO to multithreading and parallel 
computing to memory management. The strength of toolk- 
its is that they enable low-level control to algorithms and 
data representations and are designed to work alongside of 
other toolkits such as the GUI systems mentioned previ- 
ously. However, while toolkits are inherently flexible, they 
require programming expertise to use. Two popular toolkits 
are VTK (Schroeder, Martin and Lorensen, 2003) and Open 
Inventor (http://oss.sgi.com/projects/inventor/). In addition, 
OpenGL is a popular graphics standard upon which most 
current visualization systems are built. 


6.1.1 OpenGL 


OpenGL is a graphics library providing low-level access to 
graphics functionality. While it is feasible to use OpenGL 
to create visualization applications, it does not provide stan- 
dard visualization functionality such as isocontouring. Thus, 
significant effort is required to produce useful applications 
in practice. Instead, the systems mentioned in the follow- 
ing use OpenGL’s rendering and interaction capabilities and 
add higher-level functionality that greatly simplify the task 
of building useful software tools. 


6.1.2 VTK 


VTK is an open-source, object-oriented toolkit support- 
ing 3-D graphics, visualization, volume rendering, and 
image processing (www.vtk.org) (Schroeder, Martin and 
Lorensen, 2003). VTK provides hundreds of algorithms 
in an integrated system framework from which advanced 
applications can be created. While VTK does not define a 
GUI, it provides an interface to OpenGL and other popular 
tools such as X11; Windows, Qt, wxWindows, and Tk. Fur- 
thermore, VTK provides a set of powerful 3-D widgets that 
support operations on data such as clipping, cutting, trans- 
formation, and geometry creation. VTK is portable across 
all popular operating systems including Windows, Unix, 
Linux, and Mac OSX. Implemented in C++, VTK supports 


language bindings to several interpreted languages such as 
Python, Tel, and Java. 


6.1.3 Open inventor 


Open Inventor is an object-oriented 3-D toolkit offering 
a comprehensive solution to interactive graphics program- 
ming problems. It presents a programming model based on 
a 3-D scene database that simplifies graphics programming. 
It includes a rich set of objects such as cubes, polygons, 
text, materials, cameras, lights, trackballs, handle boxes, 
3-D viewers, and editors. It is built on top of OpenGL 
and defines a popular data exchange file format. However, 
unlike VTK, it does not include any visualization algo- 
rithms such as isocontouring or vector visualization. These 
must be added by the user. Another solution is to use VTK’s 
data processing pipeline in conjunction with the function- 
ality of Open Inventor. 


6.2 Development environments 


Development environments differ from toolkits in that they 
provide a GUI-based framework for creating visualization 
applications — typically by assembling data flow networks 
(so-called visual programming). Figure 16 is an example 
of such a network from the OpenDX software system. 
In addition, these systems also provide tools for building 
GUI and packaging functionality into applications. These 
tools are superb for crafting niche applications addressing 
a particular visualization need. However, as the applica- 
tion becomes more complex, the development environment 
tends to become more of a hindrance rather than a help 
since the complexity of the visual programming mecha- 
nisms often interfere with the low-level control required 
in a professional software tool. Development environments 
require programming skills, even if the skills are as simple 
as visually connecting modules into data flow networks. 


6.2.1 AVS 


One of the first true visualization development environ- 
ments, AVS, brought significant attention to the emerging 
visualization field in the late 1980s and early 1990s, AVS 
provides a dataflow editing environment and GUI cre- 
ation/packaging utility so that applications can be created 
and deployed rapidly. AVS is portable across a variety of 
platforms and maintains an open repository of contributed 
modules. See www.avs.com for more information. 


6.2.2 OpenDX 


OpenDX began as a commercial product offered by IBM 
and known as IBM Visualization Data Explorer. OpenDX 
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Figure 16. Typical data flow network as implemented in the OpenDX system. Courtesy of Kitware, Inc. Taken from the book The 
Visualization Toolkit An Object-Oriented Approach to 3-D Graphics Third Edition ISBN-1-930934-07-6. A color version of this image 


is available at http://www.mrw.interscience.wiley,com/ecm 


appeared in May 1999 when IBM converted DX into 
open-source software. OpenDX is known for its superla- 
tive visual programming environment and powerful data 
model. Development on OpenDX continues via a vital 
open-source community. Commercial support is also avail- 
able for OpenDX from Visualization and Imagery Solutions 
(http://www.vizsolutions.com/). See http://www.opendx.org 
for more information about OpenDX. 


6.2.3 TGS Amira 


TGS provides a commercial development similar to 
OpenDX and AVS. Unlike these systems, Amira 
is available in application-specific bundles such as 
molecular visualization or virtual reality applications. See 
http://www.tgs.com for more information. 


6.2.4 SCIRun 


SCIRun is an extensive computational problem solving 
environment that includes powerful visualization capabil- 
ities. Unlike the previous systems, SCIRun integrates pre- 
processing, analysis, and visualization into an interactive 
environment for the solution of PDE’s. Like the other 
development environments, visual programming is used to 
create data flow networks that link input, analysis, and 
visualization. SCIRun is designed to integrate with exter- 
nal packages and is extensible. SCIRun is available free 
for noncommercial use. Learn more about SCIRun from 
http://software.sci.utah.edu/scirun.html. 


6.3 Applications 


Turnkey applications require little or no programming. They 
are ideal for the engineer or scientist using visualization as 
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a tool to interact, manage, understand, and communicate 
about the computational process. Applications are generally 
the easiest to use if they support the capabilities required 
by the analyst. However, most applications are difficult 
to extend and become difficult to use if complex data 
exploration is required. 


6.3.1 Intelligent light FieldView 


FieldView is a commercial CFD postprocessor application 
supporting a variety of file formats, streamline genera- 
tion, geometry/surface viewing, CFD calculator, probing, 
and presentation tools. See http://www. ilight.com for more 
information. 


6.3.2 CEI EnSight Gold 


CEI EnSight Gold is a general-purpose visualization appli- 
cation for analyzing, visualizing, and communicating high- 
end scientific and engineering datasets. EnSight Gold 
takes full advantage of parallel processing and render- 
ing, provides support for an array of VR devices, and 
enables real-time collaboration. EnSight is used in automo- 
tive, biomedical, chemical processing, defense, and many 
other applications. See http://www.ceintl.com/ for more 
information. 


6.3.3 Kitware ParaView 


ParaView is an open-source, turnkey application designed 
specifically for large data visualization employing scalable, 
distributed parallel processing (although it functions well 
on single processor systems). ParaView is built on top 
of VTK and is capable of exporting and importing VTK 
scripts. Designed as a general-purpose visualization tool, 
ParaView can be customized at run-time via Tcl scripts, 
or via plug-in XML modules defining a GUI associated 
with a data processing filter. ParaView supports several 
types of rendering, including hardware acceleration on 
a single processor when the data is small enough, or 
sort-last compositing on tiled displays if requested. See 
http://www.paraview.org for more information. 


7 INTERFACING THE 
COMPUTATIONAL SYSTEM WITH 
THE VISUALIZATION SYSTEM 


Often, the greatest obstacle facing the computational sci- 
entist wishing to use visualization is interfacing analysis 
solution data with the visualization system. Regular data 


is generally easy to work with because the forms are sim- 
ple — an image is an array of values in row or column major 
order. However, more complex analyses often use unstruc- 
tured forms such as those found in finite element meshes. In 
terms of the data forms discussed in Section 1.2, the finite 
element mesh is irregular data that is attributed with various 
geometric, scalar, vector, and tensor information, and which 
can be visualized using the techniques discussed in the pre- 
vious sections. The goal of this section is to address the 
technical issues when integrating irregular data in the form 
of finite element data with the visualization system. The 
specific topics covered include the mesh topological rep- 
resentation, determining the appropriate representation of 
information to be visualized on the mesh, and the transfer of 
mesh-based information to independent graphics structures. 


7.1 Mesh topological representation 


The classic finite element mesh representation defines an 
element in terms of an ordered list of nodes — with node 
point coordinates — that implicitly define both the topology 
and geometry of an element (e.g. a three-noded 2-D element 
defines a triangle whose nodes are at the vertices and the 
edges are straight lines; an eight-noded 2-D element defines 
a quadrilateral where four of the nodes define the ver- 
tices of the element and four are ‘midside-nodes’ for each 
of the four curved, quadratic edges). Such structures can 
be used to construct the information needed by visualiza- 
tion procedures by the proper interpretation of the ordered 
nodes, and when needed, construction of other topological 
information for the mesh. The need to support mesh rep- 
resentations that are adapted during simulation and the use 
of a broader set of finite element topology and shape com- 
binations is leading to the use of richer and more formal 
representations. These representation of the mesh topology 
are in terms of the adjacency relationships between mesh 
regions, faces, edges, and vertices (see references Beall and 
Shephard (1997) and Remacle et al. (2002) for more infor- 
mation on such mesh data structures). Under the assumption 
that each topological mesh entity of dimension d, M’, is 
bounded by a set of topological mesh entities of dimen- 
sion d— 1, M¢{M¢-}, the full set of mesh topological 
entities are 


Ty = (M{M?}, M(M}}, M(M?}, M{M?}} (12) 


where M{M%}, d=0,1,2,3 are respectively the set of 
vertices, edges, faces, and regions that define the primary 
topological elements of the mesh domain. It is possible to 
limit the mesh representation to just these entities under the 
following set of restrictions (Beall and Shephard, 1997): 


1. Regions and faces have no interior holes. 

Each entity of order d; in a mesh, M4, may use 
a particular entity of lower order, M4, d; <d,, at 
most once. 

3. For any entity M“ there is a unique set of entities of 
order d, — 1, M*(M4-1) that are on the boundary of 
Mê. 

The first restriction means that regions may be directly 

represented by the faces that bound them, and faces may 

be represented by the edges that bound them. The second 
restriction allows the orientation of an entity to be defined 
in terms of its boundary entities (without the introduction 
of entity uses). For example, the orientation of an edge, 

M} bounded by vertices M? and MẸ, is uniquely defined 

as going from M? to M? only if j + k. The third restriction 

means that a mesh entity is uniquely specified by its 
bounding entities. 

As discussed in Beall and Shephard (1997), the use of 
more complete mesh topologies effectively supports general 
mesh modification operations and the ability to indepen- 
dently associate shape and other attribute information with 
the individual mesh entities. It also provides a convenient 
modeling abstraction when converting data from irregular 
form to the visualization system (see Chapter 17, this Vol- 
ume. 


7.2 Appropriate representation of information 
on the mesh 


Often, the mesh-based geometric, scalar, vector, and tensor 
attribute information is not in a form suitable for pro- 
cessing by the visualization system. This mismatch is due 
to difference in the relative position at which informa- 
tion is stored (integration points versus element centers 
versus nodal points ~ positional mismatch). Furthermore, 
most visualization systems are limited in the element shape 
functions that they support (interpolation mismatch). As 
we saw previously, visualization systems typically support 
piecewise linear distributions defined in terms of the vertex 
values on simple 2-D or 3-D polyhedra. Interfacing with 
the visualization system means addressing these two forms 
of mismatch. 

When the mismatch is due to interpolation, the simplest 
approach is to tessellate the element and produce a common 
form, typically linear simplices. Note that this mismatch 
can occur due to differences in either geometric interpola- 
tion (e.g. curvature of the mesh) or in solution interpolation 
(e.g. variation in solution unknowns). In principle, using the 
topological mesh hierarchy described previously, the mesh 
is tessellated first along edges, then faces, then regions, con- 
trolled by error measures due to shape and/or interpolation 
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variation. For example, an edge would be tessellated into 
piecewise linear line segments to minimize variation from 
the curved edge and to reduce solution error assuming lin- 
ear interpolation along each line segment. A face using that 
edge as a boundary would begin with the edge tessellation 
as a boundary condition, and then tessellate the interior 
to satisfy the appropriate face error metrics. This process 
would continue through the element regions. The output is 
then a linear mesh. 

When the mismatch is due to variation in the position of 
stored data, interpolation methods must be used. Direct and 
least squares projection procedures (Hinton and Campell, 
1974; Oden and Brauchi, 1971) are commonly applied for 
this purpose. Another simple approach is based on aver- 
aging. For example, data stored at element centers may 
be represented at vertices by averaging the data contained 
in all elements connected to that vertex. Note, however, 
that when using piecewise linear graphics primitives the 
quantities being displayed are C° and interpolation methods 
typically also assume that the data is C° on the mesh. There 
are situations in which the finite element information to be 
displayed is in fact C7! (e.g, all variables when discontinu- 
ous Galerkin methods are used or when derivative quantities 
are being displayed in the case of C? Galerkin finite ele- 
ments). However, there is no assurance that those fields are 
superior to the discontinuous ones. One way to address the 
basic accuracy of the field is to consider projection-based 
error estimation procedures (Blacker and Belytschko, 1994; 
Zienkiewicz and Zhu, 1992). As pointed out by Babuska 
et al. (1994), specific care must be taken with these pro- 
cedures to ensure that they provide superior results at the 
evaluation points used. Another alternative is to apply var- 
ious weak form (Remacle et al., 2002) or variational (de 
Miranda and Ubertini, 2002) constructs for this process. 


7.3 Transfer of solution data between 
computational grids 


Several of the graphics techniques discussed previously 
operate most effectively on specific spatial structures such 
as regular samplings of data (e.g. volume rendering). Data 
may be sampled into another form to take advantage of 
these capabilities. In previous sections, this was called 
probing. 

Any process that transforms information from one grid 
to another must determine the relationship of the cells in 
the sending grid to those in the receiving grid. Owing to the 
discrete nature of procedures employed, this means deter- 
mining what cell in one grid contains a given point in the 
other grid and the parametric location relative to the inter- 
polation of the solution in that element. Once the element is 
known, determination of the parametric location requires a 
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parametric inversion. Parametric inversion is trivial for lin- 
ear and bilinear quadrilateral elements. However, for other 
element types whose shape functions are of higher order, 
it is not easily done in closed form. Methods for the itera- 
tive solution of nonlinear parametric inversions are given in 
Cheng (1988) and Crawford et al. (1989). Crawford et al. 
(1989) used elimination theory to reduce the system of 
equations to a single polynomial equation of higher order. 

Naive methods to determine the target element by per- 
forming the parametric inversion of each element gives a 
computational growth rate of O(n), which is clearly unac- 
ceptable. This can be addressed by the use of searching 
structures, typically based on various types of data trees, 
which yield a total searching time of O(n logn). These pro- 
cedures provide a set of candidate elements to search that 
are in the local neighborhood of the point of interest. If the 
evaluation of these candidates determines that the point is 
not within any of those elements, procedures can be used 
to determine which element is the closest one to the point 
to use in the evaluation process (Niu and Shephard, 1990). 

Data structures that can be used (Dannelongue and Tan- 
guy, 1990; Zienkiewicz and Zhu, 1992) to determine a list 
of elements close to a given location include quadtrees 
(Samet, 1990), K-d trees (Overmars and van Leeuwen, 
1982), range trees (Samet, 1990), and alternating digital 
trees (Bonet and Peraire, 1991). Two interesting alternatives 
for general mesh to mesh transfers are range trees and alter- 
nating digital trees (ADT). The problem with these methods 
is the construction time (for balanced trees) and the inabil- 
ity to perform simple modifications to account for adaptive 
mesh modifications. In those cases in which the informa- 
tion is being evaluated onto a more uniform structure (e.g. 
uniform set of voxels or an octree), search methods that 
already employ such structures will be most advantageous. 
In fact, once a first point of interest is determined, the effec- 
tive traversal of the mesh based on topological adjacency 
combined with traversal of the regular data structure can be 
made very efficient. 


REFERENCES 


Babuska I, Strouboulis T, Upadhyay CS and Gangaraj SK. Super- 
convergence in the finite element method by computer proof. 
USACM Buli. 1994; 7(3):10-25. 


Beall MW and Shephard MS. A general topology-based mesh data 
structure. Int. J. Numer. Methods Eng. 1997; 40(9):1573—1596. 


Berger M and Oliger J. Adaptive mesh refinement for hyper- 
bolic partial differential equations. J. Comput. Phys. 1984; 
53:484-512, 


Blacker TD and Belytschko T. Superconvergent patch recovery 
with equilibrium and conjoint interpolant enhancements. Int. J. 
Numer. Methods Eng. 1994; 37(3):517-536. 


Bonet J and Peraire J. An alternating digital tree (ADT) algorithm 
for 3d geometric searching and intersection problems. Int. J 
Numer. Methods Eng. 1991; 31:1-17. 


Cheng JH. Automatic adaptive remeshing for finite element simu- 
lation of metal forming processes. Int. J. Numer. Methods Eng. 
1988; 26:1-18. 


Chernoff H, Using faces to represent pints in K -dimensional space 
graphically. J. Am. Stat, Assoc. 1973; 68:361~368, 


Chiang Y-J, Silva CT and Schroeder WJ. Interactive out-of-core 
isosurface extraction. Proceedings of IEEE Visualization. IEEE 
Computer Society Press: Los Alamitos, 1998. 


Conte SD and de Boor C. Elementary Numerical Analysis. Mc- 
Graw-Hill, 1972. 


Crawford RH, Anderson DC and Waggenspack WN. Mesh rezon- 
ing of 2-d isoparametric elements by inversion. Int. J. Numer. 
Methods Eng. 1989; 28:523~531. 


Dannelongue HH and Tanguy PA. Efficient data structures 
for adaptive remeshing with fem. J. Comput. Phys. 1990; 
91:94—109, 


Delmarcelle T and Hesselink L. Visualizing second-order tensor 
fields with hyperstreamlines, IEEE Comput. Graph. Appl. 1993; 
13(4):25-33. 

de Miranda S and Ubertini F. Recovery of consistent stresses for 


compatible finite elements. Comput. Methods Appl. Mech. Eng. 
2002; 191:1595—1609. 


The First Information Visualization Symposium. IEEE Computer 
Society Press: Los Alamitos, 1995. 


Foley JD, van Dam A, Feiner SK and Hughes JF. Computer 
Graphics Principles and Practice (2nd edn). Addison-Wesley: 
Reading, 1990. 


Globus A, Levit C and Lasinski T. A tool for visualizing the 
topology of three-dimensional vector fields. Proceedings of 
Visualization ’9]. IEEE Computer Society Press: Los Alamitos, 
1991; 33—40. 


Helman JL and Hesselink L. Visualization of vector field topology 
in fiuid fiows. IEEE Comput. Graph. Appl. 1991; 11(3):36-46. 


Hinton E and Campell JS. Local and Global smoothing of discon- 
tinuous finite element functions using a least squares method. 
Int. J. Numer. Methods Eng. 1974; 8:461-480, 


http://oss.sgi.com/projects/inventor/. 


Hoppe H. Progressive Meshes, Comput. Graph. (Proc. SIG- 
GRAPH ’96). 

Kenwright DN. Automatic detection of open and closed sepa- 
ration and attachment lines. Proceedings of Visualization ’98. 
IEEE Computer Society Press: Los Alamitos, 1998; 151-158. 


Kenwright DN and Haimes R. Vortex identification - applications 
in aerodynamics: a case study. Proceedings of Visualization '97. 
IEEE Computer Society Press: Los Alamitos, 1997; 413—416. 


Law C, Martin KM, Schroeder WJ and Temkin JE, A multi- 
threaded streaming pipeline architecture for large structured 
data sets. In Proceedings of IEEE Visualization ’99, October 
1999; 225-232. 


Livnat Y, Shen Han-Wei and Johnson CR. A near optimal iso- 
surface extraction algorithm using the span space. IEEE Trans. 
Visualizat. Comput. Graph. 1996; 2(1):73-84. 


Lorensen WE and Cline HE. Marching cubes: A high resolution 
3D surface construction algorithm. Comput. Graph. (Proc. SIG- 
GRAPH) 1987; 21(4):163-169. 


Luebke D, Martin R, Jonathan DC, Amitabh V, Benjamin W and 
Robert H. Level of Detail for 3D Graphics. Morgan Kaufmann, 
ISBN 1-55860-838-9, 2003. 


Machiraju R, Fowler JE, Thompson D, Schroeder WJ and Soni B. 
EVITA: A Prototype System for Efficient Visualization and Inter- 
rogation of Terascale Datasets. Technical Report MSSU-COE- 
ERC-01-02, Engineering Research Center, Mississippi State 
University, 2000. 


McCormick BH, DeFanti TA and Brown MD. Visualization in 
Scientific Computing. Report of the NSF Advisory Panel on 
Graphics, Image Processing and Workstations, 1987. 


Nielsen GM and Hamman B. The asymptotic decider: resolving 
the ambiguity in marching cubes. In Proceedings of IEEE 
Visualization ’91, San Diego, 1991. 


Niu Q and Shephard MS. Transfer of Solution Variables Between 
Finite Element Meshes. SCOREC Report 4-1990, Scientific 
Computation Research Center, RPI: Troy, New York, 1990. 


Oden JT and Brauchi HJ. On calculation of consistent stress 
distributions in finite element approximations. Int. J. Numer, 
Methods Eng. 1971; 4:337-357. 


Overmars M and van Leeuwen J, Dynamic multi-dimensional 
data structures based on quad and k-d trees. Acta Inf. 1982; 
17(3):267—285. 

Remacle J-F, Klass O, Flaherty JE and Shephard MS. A par- 
allel algorithm oriented mesh database, Eng. Comput. 2002; 
18(3):274-284. 

Remacle J-F, Li X, Shephard MS and Chevaugeon N. Transient 
mesh adaptation using conforming and non-conforming mesh 
modifications, [1th International Meshing Roundtable. Sandia 
National Laboratories, 2002; 261-272. 


Computational Visualization 549 


Rosenblum L, Earnshaw RA, Encarnacao J and Hagen H. Scien- 
tific Visualization Advances and Challenges. Harcourt Brace & 
Company: London, 1994, 


Samet H. The design and analysis of spatial data structures, 
Addison-Wesley, 1990. 


Schroeder WJ, Martin KM and Lorensen WE. The Visualization 
Toolkit: An Object Orient Approach to 3D Graphics (3rd edn). 
Kitware, Inc., ISBN-1-930934-07-6, 2003. 


Schroeder WJ, Volpe C and Lorensen WE. The stream polygon: 
a technique for 3D vector field visualization. Proceedings of 
Visualization 9], IEEE Computer Society Press: Los Alamitos, 
1991; 126-132. 


Schroeder WJ, Zarge J and Lorensen WE. Decimation of triangle 
meshes. Comput. Graph. (SIGGRAPH °92) 1992; 26(2):65-70. 


Taubin G. A signal processing approach to fair surface design. 
Comput. Graph. (Proc. SIGGRAPH) 1995. 


Taubin G, Zhang T and Golub G. Optimal surface smoothing as 
filter design. Fourth European Conference on Computer Vision 
(ECCV ’96), Cambridge, UK, April 14-18, 1996, Proceedings, 
Volume I; Springer Verlag, 1996, 

Ueng SK, Sikorski K and Ma KL. Out-of-core streamline visu- 
alization on large unstructed meshes. JEEE Trans. Visualizat. 
Comput. Graph. 1997; 3(4):370—380. 


Watt A. 3D Computer Graphics (2nd edn). Addison-Wesley: 
Reading, 1993. 


Whitted T. An improved illumination model for shaded display. 
CACM 1980; 23(6):343-349, 


Wilhelms J and Van Gelder A. Octrees for faster isosurface gen- 
eration. ACM Trans. Graph. 1992; 11(3):201-227. 


Zienkiewicz OC and Zhu JZ. Superconvergent patch recovery and 
a posteriori error estimates, part 1: The recovery technique. Int. 
J. Numer. Methods Eng. 1992; 33(7):1331-1364. 


Chapter 19 


Linear Algebraic Solvers and Eigenvalue Analysis 


Henk A. van der Vorst 


Utrecht University, Utrecht, The Netherlands 


1 Introduction 551 
2 Mathematical Preliminaries 551 
3 Direct Methods for Linear Systems 553 
4 Preconditioning 560 
5 Incomplete LU Factorizations 562 
6 Methods for the Complete Eigenproblem 567 
7 Iterative Methods for the Eigenproblem 571 
Notes 574 
References 575 


1 INTRODUCTION 


In this chapter, an overview of the most widely used 
numerical methods for the solution of linear systems of 
equations and for eigenproblems is presented. 

For linear systems Ax = b, with A a real square non- 
singular n x n matrix, direct solution methods and iterative 
methods are discussed. The direct methods are variations 
on Gaussian elimination. The iterative methods are the so- 
called Krylov projection—type methods, and they include 
popular methods such as Conjugate Gradients, MINRES, 
Bi-Conjugate Gradients, QMR, Bi-CGSTAB, and GMRES. 

Iterative methods are often used in combination with 
the so-called preconditioning operators (easily invertible 
approximations for the operator of the system to be solved). 
We will give a brief overview of the various preconditioners 
that exist. 

For the eigenproblems of the type Ax = Xx, the QR 
method, which is often considered to be a direct method 
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because of its very fast convergence, is discussed. Strictly, 
speaking, there are no direct methods for the eigenprob- 
lem; all methods are necessarily iterative. The QR method 
is expensive for larger values of n, and for these larger 
values, a number of iterative methods, including the Lanc- 
zos method, Amoldi’s method, and the Jacobi-Davidson 
method are presented. 

For a general background on linear algebra for numerical 
applications, see Golub and Van Loan (1996) and Stewart 
(1998). Modern iterative methods for linear systems are 
discussed in van der Vorst (2003). A basic introduction 
with simple software is presented in Barrett et al. (1994). 
A complete overview of algorithms for eigenproblems, 
including pointers to software, is given in Bai et al. (2000). 
Implementation aspects for high-performance computers 
are discussed in detail in Dongarra et al. (1998). 

Some useful state-of-the-art papers have appeared; we 
mention papers on the history of iterative methods by Golub 
and van der Vorst (2000) and Saad and van der Vorst 
(2000). An overview on parallelizable aspects of sparse 
matrix techniques is presented in Duff and van der Vorst 
(1999). A state-of-the-art overview for preconditioners is 
presented in Benzi (2002). 

The purpose of this chapter is to make the reader familiar 
with the ideas and the usage of iterative methods. We expect 
that guided with sufficient knowledge about the background 
of iterative methods, one will be able to make a proper 
choice for a particular class of problems. It will also provide 
guidance on how to tune these methods, in particular, for 
the selection or construction of effective preconditioners. 


2 MATHEMATICAL PRELIMINARIES 


In this section, some basic notions and notations on linear 
systems and eigenproblems have been collected. 
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2.1 Matrices and vectors 


We will be concerned with linear systems Ax = b, where 
A is usually an n x n matrix: 


AeR™ 


The elements of A will be denoted as a; j. The vectors 
x = (%,%,...,%,)7 and b belong to the linear space R”. 
Sometimes we will admit complex matrices A € C”*" and 
vectors x, b € C”, but that will be explicitly mentioned. 

Over the space IR", we will use the Euclidean inner 
product between two vectors x and y: 


n 
T 
i a La 
i=t 


and for v, w € C”, we use the standard complex inner 
product 


n 
vy =} aw; 
i=1 
These inner products lead to the 2-norm or Euclidean length 
of a vector 
lIxllg=V2x'x for x eR” 
lvl = vvv forvet” 


With these norms, we can associate a 2-norm for matrices: 
for A € R”*”, its associated 2-norm |]A|l, is defined as 


A 
|4lp=_ sup 12h 
yer'y#0 liyi 


and in the complex case, similarly, using the complex 
inner product. This matrix norm gives the maximal length 
multiplication effect of A on a vector (where the length is 
defined by the given norm). 

The associated matrix norms are convenient because they 
can be used to bound products. For A € R"**, B e IREx™, 
we have that 


(ABI, < IAIN BH, 
in particular, 


Axl, < lAllallzll, 


The inverse of a nonsingular matrix A is denoted as Aq. 
Particularly useful is the condition number of a square 
nonsingular matrix A defined as 


KA) = [|All HA2 


The condition number is used to characterize the sensitivity 
of the solution x of Ax = b with respect to perturbations 
in b and A. For perturbed systems, we have the following 
theorem. 


Theorem 1 (Golub and Van Loan, 1996; Th. 2.7.2) 
Suppose 


Ax=b, AER”, OAbER" 
(A+AA)y=b+Ab, AAER™, AbeR" 


with \|AAllz < €l Aliz and || Abliz < €llblla. 
Ifek,(4A) =r < 1, then A+ AA is nonsingular and 


lly — xll2 € 
O < (A) 
i ~ i-r? 


With the superscript T, we denote the transpose of a 
matrix (or vector): for A e R’**, the matrix B= AT € 
RE*" is defined by 


If E e C"**, then the superscript is used to denote its 
complex conjugate F = E#, defined as 


fig = Eni 


Sometimes, the superscript T is used for complex matrices 
in order to denote the transpose of a complex matrix. 

The matrix A is symmetric if A = AT, and B € C"*” is 
Hermitian if B = BĦ. Hermitian matrices have the attrac- 
tive property that their spectrum is real. In particular, 
Hermitian (or symmetric real) matrices that are positive- 
definite are attractive because they can be solved rather 
easily by proper iterative methods (the CG method). 

A Hermitian matrix A €C"*" is positive-definite if 
xHAx > 0 for all 0 #4 x e C”. A positive-definite Hermi- 
tian matrix has only positive real eigenvalues. 

We will encounter some special matrix forms, in partic- 
ular tridiagonal matrices and (upper) Hessenberg matrices. 
The matrix T = (7, ;) € R*™ will be called tridiagonal, 
if all elements for which |i — j| > 1 are zero. It is called 
upper Hessenberg if all elements for which i > j +1 are 
zero. In the context of Krylov subspaces, these matrices are 
often (k + 1) x k and they will then be denoted as T,,; ,- 


2.2 Eigenvalues and eigenvectors 


For purposes of analysis, it is often helpful or instructive 
to transform a given matrix to an easier form, for instance, 
diagonal or upper triangular form. 


The easiest situation is the symmetric case: for a real 
symmetric matrix, there exists an orthogonal matrix Q € 
R"*", so that OTAQ = D, where D e R"*" is a diagonal 
matrix. The diagonal elements of D are the eigenvalues of 
A, and the columns of Q are the corresponding eigenvectors 
of A. Note that the eigenvalues and eigenvectors of A are 
all real. 

If A € C"*" is Hermitian (A = A®), then there exist Q € 
C’*" and a diagonal matrix D € R"™*", so that Q4Q = I 
and QHAQ = D. This means that the eigenvalues of a 
Hermitian matrix are all real, but its eigenvectors may be 
complex. 

Unsymmetric matrices do not, in general, have an ortho- 
normal set of eigenvectors and may not have a complete 
set of eigenvectors, but they can be transformed unitarily 
to Schur form: 


Q"AQ=R 


in which R is upper triangular. 

if the matrix A is complex, then the matrices Q and R 
may be complex as well. However, they may be complex 
even when A is real unsymmetric. It may then be advan- 
tageous to work in real arithmetic. This can be realized 
because of the existence of the real Schur decomposition. If 
A e€R"™) then it can be transformed with an orthonormal 
Q € R” as 


QTAQ =R 
with 
Ria Riz Rix 
Ra 0 R32 Rox eR™ 
o 0 Rex 


Each Ru is either 1 x 1 or a 2 x 2 (real) matrix having 
complex conjugate eigenvalues. For a proof of this, see 
Golub and Van Loan (1996, Chapter 7.4.1). This form of 
R is referred to as an upper quasi-triangular matrix. 

If all eigenvalues are distinct, then there exists a nonsin- 
gular matrix X (in general not orthogonal) that transforms 
A to diagonal form: 


X7AX=D 


A general matrix can be transformed to Jordan form with 
a nonsingular X: 


XT!AX = diag(J,, Jo, -es J) 
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where 
nm å 1 0 0 
0 N 
4 = 7 i 
si 
0 Oo A 


If there is a J; with dimension greater than 1, then the 
matrix A is defective. In this case, A does not have a 
complete set of independent eigenvectors. In numerical 
computations, one may argue that small perturbations lead 
to different eigenvalues, and hence that it will be unlikely 
that A has a true Jordan form in actual computation. 
However, if A is close to a matrix with a nontrivial Jordan 
block, then this is reflected by a (severely) ill-conditioned 
eigenvector matrix X. 

We will also encounter eigenvalues that are called Ritz 
values. For simplicity, we will introduce them here for 
the real case. The subspace methods that are collected in 
this chapter are based on the approach to identify good 
solutions from certain low-dimensional subspaces yk CR", 
where k <n denotes the dimension of the subspace. If 
V, €R"* denotes an orthogonal basis of y*, then the 
operator H, = ViAV, € R®* represents the projection of 
A onto V,. Assume that the eigenvalues and eigenvectors 
of H, are represented as 


(k) _ gk) .© 
Hysi =0; sj 


then o is called a Ritz value of A with respect to yk 


and vst? is its corresponding Ritz vector. For a thorough 
discussion of Ritz values and Ritz vectors, see, for instance, 
Parlett (1980), Stewart (2001), and van der Vorst (2002). 

For some methods, we will see that Harmonic Ritz values 
play a role, Let W, denote an orthogonal basis for the 
subspace AV*, then the Harmonic Ritz values of A with 
respect to that subspace are the inverses of the eigenvalues 
of the projection Z, of Aq!: 


Z, = WPA W, 
3 DIRECT METHODS FOR LINEAR 
SYSTEMS 
We will first consider the case that we have to solve 
Ax =b 


with A a square n x n matrix. The standard approaches are 
based upon Gaussian elimination. This works as follows. 
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Assuming that a; , #0, one can subtract multiples of the 
first row of A of the other rows, so that the coefficients a; 1 
for i > 1 become zero. Of course, the same multiples of 
b, have to be subtracted from the corresponding b;. This 
process can be repeated for the remaining (n — 1) x (n — 1) 
submatrix, in order to eliminate the coefficients for x, 
in the second column. After completion of the process, 
the remaining matrix has zeros below the diagonal and 
the linear system can now easily be solved. For a dense 
linear system, this way of computing the solution x requires 
roughly (2/3)n° arithmetic operations. 

In order to make the process numerically stable, the rows 
of A are permuted so that the largest element (in absolute 
value) in the first column appears in the first position. This 
process is known as partial pivoting and it is repeated for 
the submatrices. 

The process of Gaussian elimination is equivalent with 
the decomposition of A as 


A=LU 


with L a lower triangular matrix and U an upper triangular 
matrix and this is what is done in modern software. After 
the decomposition, one has to solve LUx = b and this is 
done in two steps: 


1. First solve y from Ly = b. 
2. Then x is obtained from solving Ux = y. 


The computational costs for one single linear system are 
exactly the same as for Gaussian elimination, and partial 
pivoting is included without noticeable costs. The permuta- 
tions associated with the pivoting process are represented by 
an index array and this index array is used for rearranging 
b, before the solving of Ly = b. 

If one has a numter of linear systems with the same 
matrix A, but with different right-hand sides, then one can 
use the LU decomposition for all these right-hand sides. 
The solution for each new right-hand side then takes only 
O(n?) operations, which is much cheaper than to repeat the 
Gaussian elimination procedure afresh for each right-hand 
side. 

This process of LU-decomposition, with partial pivoting, 
is the recommended strategy for the solution of dense linear 
systems. Reliable software for this process is available from 
software libraries including NAG and LAPACK (Anderson 
etal., 1992) and the process is used in Matlab. It is 
relatively cheap to compute a good guess for the condition 
number of A, and this shows how sensitive the linear 
systems may be for perturbations to the elements of A and b 
(see Theorem 1). It should be noted that checking whether 


the computed solution % satisfies 


lb — Axl, Ee 
Nl, 


for some small e does not provide much information on 
the validity of % without further information on A. If A is 
close to a singular matrix, then small changes in the input 
data (or even rounding errors) may lead to large errors in the 
computed solution. The condition number of A is a measure 
for how close A is to a singular matrix (cf. Theorem 1). 

The computation of the factors L and U, with partial 
pivoting, is in general rather stable (small perturbations to 
A lead to acceptable perturbations in L and U), If this is a 
point of concern (visible through a relatively large residual 
b — A2, then the effects of these perturbed L and U can 
be largely removed with iterative refinement. The idea of 
iterative refinement is to compute r = b — Ax and to solve 
Az = r, using the available factors L and U. The computed 
solution 2 is used to correct the approximated solution 
to £+2. The procedure can be repeated if necessary. 
Iterative refinement is most effective if r is computed in 
higher precision. Apart from this, the process is relatively 
cheap because it requires only n? operations (compared to 
the O(n?) operations for the LU factorization. For further 
details, we refer to Golub and Van Loan (1996). 

For increasing n, the above sketched direct solution 
method becomes increasingly expensive (O(n?) arithmetic 
operations) and for that reason all sorts of alternative 
algorithms have been developed to help reduce the costs 
for special classes of systems. 

An important subclass is the class of symmetric positive- 


definite matrices (see Section 2.1). A symmetric positive 


definite matrix A can be decomposed as 
A=LL 


and this is known as the Cholesky decomposition of A. The 
Cholesky decomposition can be computed in about half the 
time as an LU decomposition and pivoting is not necessary. 
It also requires half the amount of computer storage, since 
only half of A and only L need to be stored (L may even 
overwrite A if A is not necessary for other purposes). It may 
be good to note that the numerical stability of Cholesky’s 
process does not automatically lead to accurate solutions x. 
This depends, again, on the condition number of the given 
matrix. The stability of the Cholesky process means that the 
computed factor L is relatively insensitive for perturbations 
of A. 

There is an obvious way to transform Ax = b into a 
system with a symmetric positive-definite matrix: 


ATAx = ATb 
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but this should be almost always avoided. It is not effi- 
cient because the construction of B = ATA requires 2n? 
arithmetic operations for a dense n x n matrix. Moreover, 
the condition number of the matrix B is the square of the 
condition number of A, which makes the solution x much 
more sensitive to perturbations. 

Another important class of matrices that occur in practi- 
cal problems involves the matrices with many zero entries, 
the so-called sparse matrices. Depending on how the 
nonzero elements of A are distributed over the matrix, 
large savings can be achieved by taking account of the 
sparsity patterns. The easiest case is when the nonzero ele- 
ments are in a (narrow) band around the diagonal of A. 
The LU factorization, even with partial pivoting, preserves 
much of this band structure, and software is available for 
these systems; see, for instance, LAPACK (Anderson et al., 
1992). 

If the nonzero entries are not located in a narrow band 
along the diagonal, then it may be more problematic to 
take advantage of the given nonzero pattern (also called 
the sparsity pattern). It is often possible to permute rows 
and or columns of A during the LU factorization process so 
that the factors L and U also remain satisfactorily sparse. 
It is not easy to code these algorithms, but software for the 
direct decomposition of sparse matrices is available (for 
instance in NAG). 

For matrices with a very special sparsity pattern or where 
the elements satisfy special properties, for instance, constant 
diagonals as in Toeplitz matrices, special algorithms have 
been derived, for instance, Fast Poisson solvers and Toeplitz 
solvers. For an introduction and further references, see 
Golub and Van Loan (1996). 

If the matrix A is nonsquare, that is, m x n, or singu- 
lar, then the Gaussian elimination procedures cannot be 
used. Instead, one may use QR factorizations, or even 
better (but more expensive), the singular value decom- 
position (SVD). For details on this, see Golub and Van 
Loan (1996) or the manuals of software libraries. The QR 
decomposition algorithm and the SVD are also available in 
Matlab. 

The alternative for direct solvers, if any of the pre- 
viously mentioned methods does not lead to the solu- 
tion with reasonable computer resources (CPU time and 
storage), may be an iterative way of solution, Iterative 
methods are usually considered for the solution of very 
large sparse linear systems. Unfortunately, there is not 
one given iterative procedure that solves a linear system 
with a general sparse matrix, similar to the LU algo- 
rithm for dense linear systems. Iterative methods come in 
a great variety, and it requires much insight and tuning 
to adapt them for classes of special problems. Therefore, 


we will pay more attention to these methods. This is nec- 
essary because many of the sparse problems, related to 
finite element or finite difference discretizations of mechan- 
ical problems, cannot be solved fast enough by direct 
methods. 


3.1 Iterative solution methods 


The idea behind iterative methods is to replace the given 
system by some nearby system that can be more eas- 
ily solved; that is, instead of Ax=b, we solve the 
simpler system Kxy =b and take x9 as an approxima- 
tion for x. Obviously, we want the correction z that 
satisfies 


A(xo +z) =b 
This leads to a new linear system 
Az =b — Axo 


Again, we solve this system by a nearby system, and most 
often one takes K again: 


Kz = b — Axo 


This leads to the new approximation x, = x9 + Zp. The cor- 
rection procedure can now be repeated for xj, and so on, 
which gives us an iterative method. 

For the basic or Richardson iteration, introduced above, 
it follows that 


Xp = Xk + Zk 
=x, + K (b — Ax) 
=x + Kr (1) 


with r, = b — Ax,. We use K~! only for notational pur- 
poses; we (almost) never compute inverses of matrices 
explicitly. When we speak of K~'b, we mean the vector 
b that is solved from Kb = b. The matrix K is called the 
preconditioner. In order to simplify our formulas, we will 
take K = / and apply the presented iteration schemes to 
the preconditioned system K~!Ax = K~1b if we have a 
better preconditioner available. 

From now on, we will also assume that x) = 0 to simplify 
future formulas. This does not mean a loss of generality, 
because the situation x7 40 can be transformed with a 
simple shift to the system 


Ay =b — Ax =b 2) 


for which obviously yọ = 0. 
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For the simple Richardson iteration, it is easily shown 
that 


x, € span{rg, Arp, ..., Ae 7} (3) 
= K*(A; ro) (4) 


The k-dimensional space spanned by a given vector v, and 
increasing powers of A applied to v, up to the (k — 1)-th 
power, is called the k-dimensional Krylov subspace, gen- 
erated with A and v, denoted by KŻ (A; v). 

Apparently, the Richardson iteration, as it proceeds, 
delivers elements of Krylov subspaces of increasing dimen- 
sion. It turns out that, at the expense of relatively little 
additional work compared to the Richardson method, we 
can identify much better approximations for the solution 
from the Krylov subspaces. This has led to the class of 
Krylov subspace methods that contain very effective itera- 
tive methods: Conjugate Gradients, GMRES, Bi-CGSTAB, 
and many more. Although older methods, such as SOR, 
can still be useful in certain circumstances, it is now gener- 
ally accepted that the Krylov solvers, in combination with 
appropriate preconditioners, are the methods of choice for 
many sparse systems. For this reason, we restrict ourselves 
to a discussion of this class of methods. These methods 
are also very attractive because of their relation to iterative 
solvers for the eigenproblem. This means that in practice 
one can obtain, with relatively little extra costs, relevant 
information about the spectrum of A. This may guide the 
design of preconditioners, but it can also be used for safer 
stopping criteria and for sensitivity analysis (estimates for 
the condition number), 


3.2 The Krylov subspace approach 


Methods that attempt to generate better approximations 
from the Krylov subspace are often referred to as Krylov 
subspace methods. Because optimality usually refers to 
some sort of projection, they are also called Krylov projec- 
tion methods. The Krylov subspace methods, for identifying 
suitable x, € K*(A; ro), can be distinguished in four differ- 
ent classes (we will still assume that x) = 0): 


1. The Ritz~Galerkin approach: Construct the x, for 
which the residual is orthogonal to the current sub- 
space: b ~ Ax, 1 K*(A; ro). 

2. The minimum norm residual approach: Identify the x, 
for which the Euclidean norm |[b — Ax, ||. is minimal 
over K*(A; ro). 

3. The Petrov-Galerkin approach: Find an x, so that the 
residual b — Ax, is orthogonal to some other suitable 
k-dimensional subspace. 


4. The minimum norm error approach: Determine x, in 
ATK*(AT; ro) for which the Euclidean norm ||x, — xli 
is minimal. 


The Ritz—Galerkin approach leads to well-known meth- 
ods such as Conjugate Gradients, the Lanczos method, 
the Full Orthogonalization Method (FOM), and General- 
ized Conjugate Gradients (GENCG). The minimum norm 
residual approach leads to methods like GMRES, MINRES, 
and ORTHODIR. The main disadvantage of these two 
approaches is that, for most unsymmetric systems, they 
lead to long and therefore expensive recurrence relations for 
the approximate solutions. This can be relieved by select- 
ing other subspaces for the orthogonality condition (the 
Galerkin condition). If we select the k-dimensional sub- 
space in the third approach as KAT; So), then we obtain 
the Bi-CG and QMR methods, and these methods indeed 
work with short recurrences. The fourth approach is not so 
obvious, but for A = AT, it leads to the SYMMLQ method 
of Paige and Saunders (1975). 

Hybrids of these approaches have been proposed, like 
CGS, Bi-CGSTAB, Bi-CGSTAB(é), TFQMR, FGMRES, 
and GMRESR. 

The choice for a method is a delicate problem. If the 
matrix A is symmetric positive-definite, then the choice is 
easy: Conjugate Gradients. For other types of matrices, the 
situation is very diffuse. GMRES, proposed in 1986 by Saad 
and Schultz (1986), is the most robust method, but in terms 
of work per iteration step, it is also relatively expensive. Bi- 
CG, which was suggested by Fletcher (1976), is a relatively 
inexpensive alternative. The main disadvantage of Bi-CG 
is that it involves per iteration an operation with A and 
one with AT. Bi-CGSTAB, proposed by van der Vorst 
(1992), is an efficient combination of Bi-CG and repeated 
1-step GMRES, avoiding operations with AT, Bi-CGSTAB 
requires about as many operations per iteration as Bi-CG. 
A more thorough discussion on Krylov methods is given in 
van der Vorst (2003). Other useful sources of information 
on iterative Krylov subspace methods include Axelsson 
(1994), Brezinski (1997), Bruaset (1995), Fischer (1996), 
Greenbaum (1997), Hackbusch (1994), Meurant (1999), 
and Saad (1996). 


3.3 The Krylov subspace 


In order to identify the approximations corresponding to the 
four different approaches, we need a suitable basis for the 
Krylov subspace. 

Arnoldi (1951) proposed to compute an orthogonal basis 
as follows. Start with v, = ro/{lroll). Then compute Av,, 
make it orthogonal to v, and normalize the result, which 


v4 = rof |o 
for j=1,..,m-1 


fa g= Welles 
Vir = tiaj 
end 


Figure 1. Amoldi’s method with modified Gram-Schmidt ortho- 
gonalization. 


gives v,. The general procedure is as follows: Assum- 
ing we already have an orthonormal basis v,,...,v,; for 
KJ(A;rq), this basis is expanded by computing t = Av; 
and by orthonormalizing this vector ¢ with respect to 
Vgs sees Uje 

This leads to an algorithm for the creation of an orthonor- 
mal basis for X” (A; rp), as in Figure 1. The orthogonaliza- 
tion can be conveniently expressed in matrix terms. Let V; 
denote the matrix with columns v, up to v,, then it follows 
that 


AV nt = Vin Anm-1 6) 


The m x (m — 1) matrix Hm m-1 is upper Hessenberg, and 
its elements h; į are defined by the Arnoldi algorithm. f 

From a computational point of view, this construction is 
composed of three basic elements: a matrix-vector prod- 
uct with A, inner products, and vector updates. We see that 
this orthogonalization becomes increasingly expensive for 
increasing dimension of the subspace, since the computa- 
tion of each h; ; requires an inner product and a vector 
update, 

Note that if A is symmetric, then so is Hpi m-1 = 
Vi_,AV,,1> 50 that in this situation H,, 1,1 is tridiago- 
nal. This means that in the orthogonalization process, each 
new vector has to be orthogonalized with respect to the 
previous two vectors only, since all other inner products 
vanish. The resulting three-term recurrence relation for the 
basis vectors of K,,(A; ro) is known as the Lanczos method 
(Lanczos, 1950) and some very elegant methods are derived 
from it. In this symmetric case, the orthogonalization pro- 
cess involves constant arithmetical costs per iteration step: 
one matrix-vector product, two inner products, and two 
vector updates. 

We have discussed the construction of an orthonormal 
basis for the Krylov subspace because this also plays an 
important role for iterative eigenvalue solvers. Furthermore, 
the construction defines some matrices that play a role in the 
description of the iterative solvers. In practice, the user will 
never be concerned with the construction, because this is 
done automatically in all algorithms that will be presented. 
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3.4 The Ritz—Galerkin approach 


The Ritz—Galerkin conditions imply that r -L K*(A; ro), 
and this is equivalent to 


Vi (b — Ax) =0 


Since b = rp = lirollzvi, it follows that Vb = l|rpllze, with 
e, the first canonical unit vector in R*. With x, = Viy we 
obtain 


VIAV y = llroll2€: 


This system can be interpreted as the system Ax = b 
projected onto the subspace KE(A; To). 

Obviously, we have to construct the k x k matrix VZAV ,, 
but this is immediately available from the orthogonalization 
process: 


VAV , = Hpg 


so that the x, for which r, L KA; ro) can be easily com- 
puted by first solving H, +Y = |lřoll2€1, and then forming 
x, = Vy. This algorithm is known as FOM or GENCG 
(see Saad and Schultz, 1986). 

When A is symmetric, then H, reduces to a tridiag- 
onal matrix T,,, and the resulting method is known as 
the Lanczos method (Lanczos, 1952). When A is in addi- 
tion positive-definite, then we obtain, at least formally, 
the Conjugate Gradient method. In commonly used imple- 
mentations of this method, one implicitly forms an LU 
factorization for T, ,, without generating T, , itself, and this 
leads to very elegant short recurrences for the x, and the 
corresponding 7;; see the algorithm presented in Figure 2. 
This algorithm includes preconditioning with an operator 


Xo is an initial guess, f9 = b- AXo 
for i=1, 2, na 

Solve Kw,_, = j1 

Pat = hn Wi-1 

if/=1 


Pi= Wi- 

else 
Bimi = Pi-1/Pi-2 
Pi= Wim + Bi-1Pi-1 

endif 

q;= Api . 

= Pi} Pi q; 

Xi= Xing + OPi 

N= h1 q; 

If x; accurate enough then quit 

end 


Figure 2. Conjugate gradients with preconditioning K. 
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K, which should be a fixed approximation for A throughout 
the entire iteration process. 

The positive definiteness is necessary to guarantee the 
existence of the LU factorization, but it also guarantees 
that lix, — x|], is minimal [1] over all possible x, from the 
Krylov subspace of dimension k. 


3.5 The minimum norm residual approach 


We look for an x, € K*(A; 79), that is, x, = V,y, for which 
llb — Axg||z is minimal. This norm can be rewritten, with 
P = lirollz, as 


lb — Axp ile = llb — AV yi2 = WV ener — Veyi rpi Yll 


using the Krylov relation (5). Now we exploit the fact that 
V,, is an orthonormal transformation with respect to the 
Krylov subspace K*+!(A; ro): 


llb — Axzlle = lee, — Hygi pyle 


and this final norm can simply be minimized by solving the 
minimum norm least squares problem for the (k + 1) x k 
matrix H;,, 1, and right-hand side ||7,||,e,. The least squares 
problem is solved by constructing a QR factorization of 
Hy% and because of the upper Hessenberg structure that 
can conveniently be done with Givens transformations (see 
Golub and Van Loan, 1996). 

The GMRES method is based upon this approach. In 
order to avoid excessive storage requirements and compu- 
tational costs for the orthogonalization, GMRES is usually 
restarted after each cycle of m iteration steps. This algo- 
rithm is referred to as GMRES(m); the not-restarted version 
is often called ‘full’? GMRES. There is no simple rule to 
determine a suitable value for m; the speed of convergence 
over cycles of GMRES(m) may drastically vary for nearby 
values of m. It may be the case that GMRES(m + 1) is 
much more expensive than GMRES(m), even in terms of 
numbers of iterations. 

We present in Figure 3, the modified Gram-Schmidt 
version of GMRES(m) for the solution of the linear system 
Ax =b. The application to preconditioned systems, for 
instance, K7!Ax = K~'b, is straightforward. 

For an excellent overview of GMRES and related vari- 
ants, such as FGMRES, see Saad (1996). 


3.6 GMRESR 


There exist also variants of GMRES that permit for a 
variable preconditioning. This is particularly convenient 


r= b- AXo, for a given initial guess xo 
X= Xp 
for j=1, 2,.... ix 


B= irll 4 = F/B, b= Bey 
for /=1,2,...,m 


for k=1,..., 


hyi = VW, W= W- Rg Ve 
hiya =| lle. Vier = Whina, 


Y = Oat Seal 
Tki = Spat Ck-1Pki 
tea EY 
B= Writ Misa, C= 418. Si = Fay i/5 
Hee Gify + Say 
A AA 
bi4=-8;b; by= cib; 
A 
P = |b] (= lib - AXy—1ymaille) 
if p is small enough then 
(n,= i, goto SOL) 
A 
n =m, Yn = bn, [Tate 
SOL: fork=n,-1,...,1 
A Ta? 
Yk = (be Žiari VOM th 


4 Da i Yi Yi, If p small enough quit 
r=b- 


Figure 3. Unpreconditioned GMRES(m). 


in combination with domain decomposition methods, if 
the problem per domain is solved by an iteralive solver 
itself. 

The two most well-known variants are FGMRES (Saad, 
1993) and GMRESR (van der Vorst and Vuik, 1994). 
Because GMRESR is, in the author’s view, the most robust 
of the two, it has been presented here. The GMRESR algo- 
rithm can be described by the computational scheme in 
Figure 4. 


Xo is an initial guess; fp = b — Axo; 
for i=0, 1, 2, 3, .... 
Let z'™ be the approximate solution of Az= r; 
obtained after m steps of an iterative method. 
c= Az") (often available from the iterative method) 
for k=0,...,i-4 
a= (Cp 0) 
C=C- QC 
z = 2 au 
end 
c= cfle lle; u;= z™/ fe la 
Xit =X; (C Uy 
Fiaa = him (Ci, NC; 


if x;.4 is accurate enough then quit 
end 


Figure 4. The GMRESR algorithm. 


A sufficient condition to avoid breakdown (when jc||, = 
0) is that the norm of the residual at the end of an inner 
iteration is smaller than the right-hand residual: || Az’ — 
rilla < llr;llz- This can easily be controlled during the inner 
iteration process. If stagnation occurs, that is, no progress 
at all is made in the inner iteration, then van der Vorst 
and Vuik (1994) suggest doing one (or more) steps of the 
LSQR method, which guarantees a reduction (although this 
reduction is often only small). 

When memory space is a limiting factor or when the 
computational costs per iteration become too high, we can 
simply truncate the algorithm (instead of restarting as in 
GMRES(m)). If we wish only to retain the last m vectors 
c; and u,;, the truncation is effected by replacing the for k 
loop in Figure 4 by 


for k = max(0,i-—m),...,i-—1 


and of course, we have to adapt the remaining part of 
the algorithm so that only the last m vectors are kept in 
memory. 

For a full discussion on GMRESR, see van der Vorst 
(2003). 


3.7 The Petrov—Galerkin approach 


For unsymmetric systems, we cannot, in general, reduce the 
matrix A to a tridiagonal system in a lower-dimensional 
subspace by orthogonal projections. The reason is that we 
cannot create an orthogonal basis for the Krylov subspace 
by a three-term recurrence relation (Faber and Manteuffel, 
1984). We can, however, obtain a suitable nonorthogonal 
basis with a three-term recurrence, by requiring that this 
basis is orthogonal with respect to some other basis. 

For this other basis, we select a convenient basis for the 
Krylov subspace generated with AT and starting vector w4. 
It can be shown that a basis v,,...,v; for K'(A; v,) can 
be created with a three-term recurrence relation, so that the 
v; are orthogonal with respect to the w,, fork + j. The w; 
are generated with the same recurrence relation as for the 
v,, but with A replaced by AT. 

In matrix notation, this leads to WAV; = D;T,;, and 
also that V7 ATW, = D,TJ,;, with D; = W/V, a diagonal 
matrix and 7; ; a tridiagonal matrix. These bi-orthogonal 
sets of vectors form the basis for methods as Bi-CG and 
QMR. 

Bi-CG is not as robust as CG. It may happen, for 
instance, that w7 v; = 0 and then the method breaks down. 
Bi-CG is based on an LU decomposition of T, ;, but since 
T,; is not necessarily positive-definite or so, a flawless 
LU decomposition in bidiagonal L and U may not exist, 
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which gives another breakdown of the method. Fortunately, 
these circumstances do not occur frequently, but one has 
to be aware of them and carry out the required checks. 
There exist techniques to repair these breakdowns, but they 
require complicated coding. It may be convenient just to 
restart when a breakdown occurs, giving up some of the 
efficiency of the method and, of course, with a chance that 
breakdown occurs again. For a full treatment of Bi-CG, see 
van der Vorst (2003). 


3.8 Bi-CGSTAB 


Sonneveld showed that the two operations with A and AT 
per iteration of Bi-CG can be replaced by two operations 
with A and with the effect that i iterations with Bi-CG are 
applied twice: once with the starting residual rọ and then 
again with the same iteration constants on r;. Surprisingly, 
this can be done for virtually the same computational costs 
as for Bi-CG, but the result is a method that often converges 
about twice as fast as Bi-CG. This method is known as CGS 
(Sonneveld, 1989). 

Sonneveld’s principle was further perfected in van der 
Vorst (1992) for the construction of Bi-CGSTAB in which 
the Bi-CG operations with AT are replaced by operations 
with A and they are used to carry out GMRES(1) reductions 
on top of each Bi-CG iteration. 

The preconditioned Bi-CGSTAB algorithm for solving 
the linear system Ax = b, with preconditioning K, reads 
as in Figure 5. 

The matrix K in this scheme represents the precondi- 
tioning matrix and the way of preconditioning (van der 
Vorst, 1992), The above scheme, in fact, carries out the 
Bi-CGSTAB procedure for the explicitly postconditioned 
linear system 


AK !y =b 


but the vectors y; and the residual have been back- 
transformed to the vectors x; and r; corresponding to the 
original system Ax = b. 

The computational costs for Bi-CGSTAB are, per itera- 
tion, about the same as for Bi-CG. However, because of 
the additional GMRES(1) steps after each Bi-CG step, Bi- 
CGSTAB converges often considerably faster. Of course, 
Bi-CGSTAB may suffer from the same breakdown prob- 
lems as Bi-CG. In an actual code, we should test for 
such situations and take appropriate measures, for exam- 
ple, restart with a different 7 (= w,) or switch to another 
method (for example GMRES). 

The method has been further generalized to Bi- 
CGSTAB(£), which generates iterates that can be inter- 
preted as the product of Bi-CG and repeated GMRES(£). 
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Xo Ís an initial guess, % = b~ Ax, 
Choose F; for example, F= fy 
for /=1, 2, .... 
Pia ed 
if p;_, = 0 method falls 
ifi=1 
Pr hi1 
else 
Bit = (Pj /P 1-2) (O41 /j1) 
PiS hia + Bi- (Dj4- O14 Ying 
endif | 
Solve p from Kp= P; 


if {js |] smali enough then 
Xj= Xii t aP, quit 
Solve Sirom K$=s 
t=AS 
osf s/t 
X= X1 + P+ OS 
if x is accurate enough then quit 
= S— @t 
i continuation it is necessary that @;# 0 
en 


Figure 5. The Bi-CGSTAB algorithm with preconditioning. 


For more details, see van der Vorst (2003). Software for 
this method is available from NAG. 


4 PRECONDITIONING 
4.1 Introduction 


As we have seen in our discussions on the various Krylov 
subspace methods, they are not robust in the sense that 
they can be guaranteed to lead to acceptable approxi- 
mate solutions within modest computing time and storage 
(modest with respect to alternative solution methods). For 
some methods (for instance, fult GMRES), it is obvious 
that they lead, in exact arithmetic to the exact solution 
in maximal n iterations, but that may not be very prac- 
tical. Other methods are restricted to specific classes of 
problems (CG, MINRES) or occasionally suffer from such 
nasty side-effects as stagnation or break down (Bi-CG, 
Bi-CGSTAB). Such poor convergence depends in a very 
complicated way on spectral properties (eigenvalue distri- 
bution, field of values, condition of the eigensystem, etc.) 
and this information is not available in practical situa- 
tions. 

The trick is then to try to find some nearby operator 
K such that K~!A has better (but still unknown) spectral 
properties. This is based on the observation that for K = A, 
we would have the ideal system K~!Ax = Ix = K~'b and 


all subspace methods would deliver the true solution in 
one single step. The hope is that for K in some sense 
close to A, a properly selected Krylov method applied 
to, for instance, K~!Ax = K~'b, would need only a few 
iterations to yield a good enough approximation for the 
solution of the given system Ax = b. An operator that is 
used with this purpose is called a preconditioner for the 
matrix A. 

The general problem of finding an efficient precondi- 
tioner is to identify a linear operator K (the preconditioner) 
with the properties that [2] 


(1) K is a good approximation to A in some sense, 

(2) the cost of the construction of K is not prohibitive, 

(3) the system Ky =z is much easier to solve than the 
original system. 


There is a great freedom in the definition and construc- 
tion of preconditioners for Krylov subspace methods. Note 
that in all the Krylov methods, one never needs to know 
individual elements of A, and one never has to modify parts 
of the given matrix. It is always sufficient to have a rule 
(subroutine) that generates, for given input vector y, the 
output vector z that can mathematically be described as 
z = Ay. This also holds for the nearby operator: it does not 
have to be an explicitly given matrix. However, one should 
realize that the operator (or subroutine) that generates the 
approximation for A can be mathematically represented as 
a matrix. It is then important to verify that application of 
the operator (or subroutine, or possibly even a complete 
code) on different inputs leads to outputs that have the 
same mathematical relation, with the same (possibly explic- 
itly unknown) matrix K. For some methods, in particular 
Flexible GMRES and GMRESR, it is permitted that the 
operator K is (slightly) different for different input vectors 
(variable preconditioning). This plays an important role in 
the solution of nonlinear systems, if the Jacobian of the 
system is approximated by a Frechet derivative, and it is 
also attractive in some domain decomposition approaches 
(in particular, if the solution per domain itself is obtained 
by some iterative method again). 

The following aspect is also important. One never (except 
for some trivial situations) forms the matrix K~!A explic- 
itly. In many cases that would lead to a dense matrix and 
that would destroy all efficiency that could be obtained for 
the often sparse A. Even for dense matrix A, it might be 
too expensive to form the preconditioned matrix explicitly. 
Instead, for each required application of K~!A to some 
vector y, we first compute the result w of the operator A 
applied to y and then we determine the result z of the oper- 
ator K~! applied to w. This is often done by solving z from 
Kz = w, but there are also approaches by which approxi- 
mations M for A™! are constructed (e.g. sparse approximate 


inverses) and then one applies, of course, the operator M 
to w in order to obtain z. Only very special and simple to 
invert preconditioners like diagonal matrices can be applied 
explicitly to A. This can be done before and in addition to 
the construction of another preconditioning. 

Remember always that whatever preconditioner we con- 
struct, the goal is to reduce CPU time (or memory storage) 
for the computation of the desired approximated solution. 

There are different ways of implementing precondition- 
ing; for the same preconditioner, these different implemen- 
tations lead to the same eigenvalues for the preconditioned 
matrices. However, the convergence behavior is also depen- 
dent on the eigenvectors or, more specifically, on the 
components of the starting residual in eigenvector direc- 
tions. Since the different implementations can have quite 
different eigenvectors, we may thus expect that their con- 
vergence behavior can be quite different. Three different 
implementations are as follows: 


-1. Left-preconditioning: Apply the iterative method to 


K~!Ax = K~'b. We note that symmetry of A and K 
does not imply symmetry of K~'A. However, if K 
is symmetric positive-definite, then [x, y] = (x, Ky) 
defines a proper inner product. It is easy to verify that 
K'A is symmetric with respect to the new inner prod- 
uct [, ], so that we can use methods like MINRES, 
SYMMLQ, and CG (when A is positive-definite as 
well) in this case. Popular formulations of precondi- 
tioned CG are based on this observation. 

If we are using a minimal norm residual method 
(GMRES or MINRES), we should note that with left- 
preconditioning, we are minimizing the preconditioned 
residual K~!(b — Ax,), which may be quite different 
from the residual b — Ax,. This could have conse- 
quences for stopping criteria that are based on the norm 
of the residual. 

2. Right-preconditioning: Apply the iterative method to 
AK~y = b, with x = K~ly. This form of precondi- 
tioning also does not lead to a symmetric product when 
Aand K are symmetric. With right-preconditioning, we 
have to be careful with stopping criteria that are based 
upon the error: |y — yzl, may be much smaller than 
the error-norm {jx — xl} (equal to IKO — y,)ll2) 
that we are interested in. Right-preconditioning has the 
advantage that it only affects the operator and not the 
right-hand side. This may be an attractive property in 
the design of software for specific applications. 

3. Two-sided preconditioning: For a preconditioner K 
with K = K,K,, the iterative method can be applied 
to Ky AKy!2 = KĮ 'b, with x = K3'z. This form of 
preconditioning may be used for preconditioners that 
come in factored form. This can be seen as a com- 
promise between left- and right-preconditioning. This 
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form may be useful for obtaining a (near) symmet- 
ric operator for situations where K cannot be used for 
the definition of an inner product (as described under 
jeft-preconditioning). 


Note that with all these forms of preconditioning, either 
explicit or implicit (through a redefinition of the inner 
product), we are generating a Krylov subspace for the pre- 
conditioned operator. This implies that the reduced matrix 
H, p (cf. (5)) gives information about the preconditioned 
matrix; in particular, the Ritz values approximate eigen- 
values of the preconditioned matrix. The generated Krylov 
subspace cannot be used in order to obtain information as 
well for the unpreconditioned matrix. 

The choice of K varies from purely ‘black box’ algebraic 
techniques, which can be applied to general matrices, to 
‘problem-dependent’ preconditioners that exploit special 
features of a particular problem class. Examples of the 
last class are discretized PDEs, where the preconditioner 
is constructed as the discretization of a nearby (easier to 
solve) PDE. Although problem-dependent preconditioners 
can be very powerful, there is still a practical need for 
efficient preconditioning techniques for large classes of 
problems. 

There are only very few specialized cases where it is 
known a priori how to construct a good preconditioner and 
there are few proofs of convergence, except in very ideal- 
ized cases. For a general system, however, the following 
approach may help to build up one’s insight into what is 
happening. For a representative linear system, one starts 
with unpreconditioned GMRES(m), with m as high as pos- 
sible. In one cycle of GMRES(m), the method explicitly 
constructs an upper Hessenberg matrix of order m, denoted 
by Hmm This matrix is reduced to upper triangular form, 
but before this takes place, one should compute the eigen- 
values of Hp m, Called the Ritz values. These Ritz values 
usually give a fairly good impression of the most relevant 
parts of the spectrum of A. Then one does the same with the 
preconditioned system and inspects the effect on the spec- 
trum. If there is no specific trend of improvement in the 
behavior of the Ritz values [3], when we try to improve 
the preconditioner, then obviously we have to look for 
another class of preconditioner. If there is a positive effect 
on the Ritz values, then this may give us some insight 
into how much more the preconditioner has to be improved 
in order to be effective. At all times, we have to keep in 
mind the rough analysis that we made in this chapter and 
check whether the construction of the preconditioner and 
its costs per iteration are still inexpensive enough to be 
amortized by an appropriate reduction in the number of 
iterations. 

In this section, some of the more popular precondition- 
ing techniques are described and references and pointers 


562 Linear Algebraic Solvers and Eigenvalue Analysis 


for other techniques are given. The reader is referred to 
Axelsson (1994), Chan and van der Vorst (1997), Saad 
(1996), Meurant (1999), and van der Vorst (2003) for more 
complete overviews of (classes of) preconditioners. See 
Benzi (2002) for a very readable introduction to various 
concepts of preconditioning and for many references to 
specialized literature. 


5 INCOMPLETE LU FACTORIZATIONS 


Originally, preconditioners were based on direct solution 
methods in which part of the computation is skipped. This 
leads to the notion of Incomplete LU (or ILU) factorization 
(Meijerink and van der Vorst, 1977). We will now discuss 
these incomplete factorizations in more detail. 

Standard Gaussian elimination is equivalent to factoring 
the matrix A as A = LU, where L is lower triangular and U 
is upper triangular. In actual computations, these factors are 
explicitly constructed. The main problem in sparse matrix 
computations is that the factors of A are often a good deal 
less sparse than A, which makes solving expensive. The 
basic idea in the point ILU preconditioner is to modify 
Gaussian elimination to allow fill-ins at only a restricted 
set of positions in the LU factors. 

The following theorem collects results for the situation in 
which we determine a priori the positions of the elements 
that we wish to ignore during the Gaussian elimination 
process. Note that this is not a serious restriction because 
we may also neglect elements during the process according 
to certain criteria and this defines the positions implicitly. 
The indices of the elements to be ignored are collected in 
a set S: 


SCS, S14 Di AL sisnlsj<n} © 


We can now formulate the theorem that guarantees the 
existence of incomplete decompositions for the M-matrix 
A (cf. Meijerink and van der Vorst, 1977, Th. 2.3). 


Theorem 2. Let A= (a; j) be an nx n M-matrix [4], 
then there exists for every S C S, a lower triangular matrix 
L = (€; ;), with €;; = 1, an upper triangular matrix U = 
u; j), and a matrix N = (nj) with 


£,; =0, u; j =9, fijes 

e n; = Of, j) £ S, such that the splitting A = LU — 
N leads to a convergent iteration ( 1). 

The factors L and U are uniquely defined by S. 


One can make, of course, variations on these incomplete 
splittings, for instance, by isolating the diagonal of U as a 
separate factor. When A is symmetric and positive-definite, 


then it is obvious to select S$ so that it defines a symmetric 
sparsity pattern and then_one can rewrite the factorization 
so that the diagonals of L and U are equal. This is known 
as an incomplete Cholesky decomposition. 

A commonly used strategy is to define $ by 


S={(i, Dla, #0} o ® 


That is, the only nonzeros allowed in the LU factors are 
those for which the corresponding entries in A are nonzero. 
It is easy to show that the elements k, ; of K match those 
of A on the set S: 


jaa; if GES 8) 


Even though the condition (8) is sufficient (for certain 
classes of matrices) to determine the nonzero entries of L 
and U directly, it is more natural and simpler to compute 
these entries on the basis of a simple modification of the 
Gaussian elimination algorithm (see Figure 6). The main 
difference from the usual Gaussian elimination algorithm is 
in the innermost j-loop, where an update to a; ; is computed 
only if it is allowed by the constraint set S. 

After the completion of the algorithm, the incomplete 
LU factors are stored in the corresponding lower and upper 
triangular parts of the array A. It can be shown that the 
computed LU factors satisfy (8). 

The incomplete factors L and U define the precondi- 
tioner K = (LU)7!. In the context of an iterative solver, 
this means that we have to evaluate expressions like z = 
(LU)~!y for any given vector y. This is done in two 
steps: first obtain w from the solution of Lw = y and then 
compute z from Õz = w. Straightforward implementation 
of these processes leads to recursions, for which vector 
and parallel computers are not ideally suited, This sort of 
observation has led to reformulations of the preconditioncr, 
for example, with reordering techniques or with blocking 


ILU for an nx n matrix A (cf. Axelsson, 1994): 
ftor k=1,2,..,9-1 
aay, 
for j=4+1,k+2, 2.9 
if (i kye S 
8= daki a,= € 
forj=k+1,.. 


P, 
if (fe Sand(k,j)e S 
ajj= aj- eak, j 
end if 

end j 
end if 
endi 
end k 


Figure 6. ILU for a general matrix A. 


techniques. It has also led to different types of precondition- 
ers, including diagonal scaling, polynomial preconditioning, 
and truncated Neumann series. These approaches may be 
useful in certain circumstances, but they tend to increase 
the computational complexity because they often lead to 
more iteration steps. 

A well-known variant on ILU is the so-called Modified 
ILU (MILU) factorization (Dupont et al., 1968; Gustafsson, 
1978). For this variant, the condition (8) is replaced by 


Yk = la, + ch? for i=1,2,...,2 (9) 
j=l 


jal 


The term ch? is for grid-oriented problems with mesh-size 
h. Although in many applications this term is skipped (that 
is, one often takes c = 0), this may lead to ineffective pre- 
conditioning (van der Vorst, 1989) or even breakdown of 
the preconditioner (see Eijkhout, 1992). In our context, the 
row sum requirement in (9) amounts to an additional cor- 
rection to the diagonal entries compared to those computed 
in Figure 6. The correction leads to the observation that 
Kz % Az for almost constant z (in fact, this was the moti- 
vation for the construction of these preconditioners). This 
results in very fast convergence for problems in which the 
solution is locally smooth. However, quite the opposite may 
be observed for problems in which the solution is far from 
smooth. For such problems, MILU may lead to much slower 
convergence than ILU. 

The incomplete factorizations have been generalized 
with blocks of A instead of single elements. The inverses 
of diagonal blocks in these incomplete block factoriza- 
tions are themselves again approximated, for instance, 
by their diagonal only or by the tridiagonal part (for 
details on this, see Axelsson, 1994; Meurant, 1999). In 
the author’s experience, block incomplete decompositions 
can be quite effective for linear systems associated with 
two-dimensional PDEs, discretized over rectangular grids. 
However, for three-dimensional problems, they appeared to 
be less effective. 


5.1 Reordering the unknowns 


A standard trick for exploiting parallelism is to select all 
unknowns that have no direct relationship with each other 
and to number them first. For the 5-point finite differ- 
ence discretization over rectangular grids, this approach 
is known as a red—black ordering. For elliptic PDEs, this 
leads to parallel preconditioners. The performance of the 
preconditioning step is as high as the performance of the 
matrix-vector product. However, changing the order of the 
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unknowns leads, in general, to a different preconditioner. 
Duff and Meurant (1989) report on experiments that show 
that most reordering schemes (for example, the red—black 
ordering) lead to a considerable increase in iteration steps 
(and hence in computing time) compared to the standard 
lexicographical ordering. For the red—black ordering, asso- 
ciated with the discretized Poisson equation, it can be 
shown that the condition number of the preconditioned sys- 
tem is only about one-quarter that of the unpreconditioned 
system for ILU, MILU, and SSOR, with no asymptotic 
improvement as the gridsize h tends to zero (Kuo and Chan, 
1990). 

One way to obtain a better balance between parallelism 
and fast convergence is to use more colors (Doi, 1991). In 
principle, since there is not necessarily any independence 
between different colors, using more colors decreases the 
parallelism but increases the global dependence and hence 
the convergence. In Doi and Hoshi (1992) up to 75 col- 
ors are used for a 76? grid on the NEC SX-3/14, resulting 
in a 2 Gflop/s performance, which is much better than for 
the wavefront ordering. With this large number of colors, 
the speed of convergence for the preconditioned process is 
virtually the same as with a lexicographical ordering (Doi, 
1991). 

The concept of multicoloring has been generalized to 
unstructured problems by Jones and Plassmann (1994). 
They propose effective heuristics for the identification of 
large independent subblocks of a given matrix. For prob- 
lems large enough to get sufficient parallelism in these 
subblocks, their approach leads to impressive speedups 
compared to the natural ordering on a single proces- 
sor. 

Another approach, suggested by Meurant (1984), exploits 
the idea of the two-sided (or twisted) Gaussian elimination 
procedure for tridiagonal matrices. This is generalized for 
the incomplete factorization. By van der Vorst (1987), it is 
shown how this procedure can be done in a nested way. 
For 3D finite difference problems, twisting can be used for 
each dimension that gives an increase in parallelism by a 
factor of 2 per dimension. This leads, without further com- 
putational overhead, to incomplete decompositions, as well 
as triangular solves, that can be done in eight parallel parts 
(two in each dimension). For a discussion of these tech- 
niques, see Dongarra et al. (1998). This parallel ordering 
technique is sometimes referred to as ‘vdv’ ordering (Duff 
and Meurant, 1989) or ‘Van der Vorst’ ordering (see, for 
example, Benzi, 2002). 

A more sophisticated approach that combines ideas from 
twisting, domain decomposition with overlap, and reordering, 
was proposed by Magolu monga Made and van der Vorst 
(2001a,b, 2002). We will explain this idea for the special 
situation of a discretized second-order elliptic PDE over a 
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rectangular domain. The discretization has been carried out 
with the standard 5-point central difference stencil, which, 
over a rectangular grid with lexicographical ordering, leads 
to the familiar block matrix with 5 nonzero diagonals. 

The first step is to split the domain in blocks, as in 
domain decomposition methods, and to order the unknowns 
lexicographically per block. This has been indicated, for 
the case of 8 horizontal blocks, in Figure 7. Per block, 
we start counting from one side (‘the bottom layer’); the 
points on the last line (‘the top layer’) are ordered after all 
subdomains, as is indicated in Figure 8. For instance, the 
lines 1, 2, 3, and 26, all belong to the block stored with 
processor Po, but in the matrix interpretation, the first 3 
lines are ordered first and line 26 appears in the matrix 
only after all other ‘interior’ lines. This means that the 
matrix has the following nonzero structure (we give only 
a relevant part of the matrix). Note that we have already 
introduced another element in our ordering, namely, the 
idea of twisting: the lines of the subdomains are ordered 
from bottom to top and from top to bottom in Figure 7. 

Now imagine what happens if we carry out an incomplete 
LU factorization with zero fill. This would create level-1 
fill in the error matrix. Note that, in particular, we would 
introduce fill in the subblock of the matrix that connects 
line 26 with line 5, and note also that we would not have 
seen this level-1 fill, if we would have selected all points 
lexicographically. 


yh eE Bottom layer’s gridpoints 
Top layer’s gridpoints 


si ECE 2 
: H 


HPs HH 


saat HAH Po! H l 


= 
x 
Figure 7. Decomposition of the grid into stripes, and assignment 
of subdomains to processors for p = 8. Arrows indicate the pro- 
gressing direction of the line numbering per subdomain. Numbers 
along the y-axis give an example of globai (line) ordering, which 
satisfies all the required conditions. Within each horizontal line, 
gridpoints are ordered lexicographically. 


Figure 8. The structure of the reordered matrix. 


This means that if we want the block ordering to be 
at least as effective as the standard ordering, we have 
to remove this additional fill. This can be interpreted as 
permitting level-1 fill in a small overlap, and this is the 
reason for the name pseudo-overlap for this way of order- 
ing. How to generalize this idea for more arbitrary matri- 
ces is obvious: one compares the new ordering with the 
standard given one and one includes the possibly addi- 
tional level-1 fill in the preconditioner. The idea can also 
= easily applied to preconditioners with a higher-level 

Il. 

In Magolu monga Made and van der Vorst (2001a,b), it is 
suggested that the pseudo-overlap be increased and higher 
levels of fill that are introduced by the new block-wise 
ordering also be included. For high dimensional problems 
and relatively low numbers of processors, this leads to 
an almost negligible overhead. It is shown by analysis in 
Magolu monga Made and van der Vorst (2002) and by 
experiments mentioned in Magolu monga Made and van 
der Vorst (2001a,b) that the block ordering with pseudo- 
overlap may lead to parallelizable incomplete decompo- 
sitions that are almost perfectly scalable if the number 
of processors p is less than ./n, where n denotes the 
order of the given linear system (the reported experi- 
ments include experiments for 16 processors, for n =% 
260 000). 


5.2 Variants of ILU preconditioners 


Many variants on the idea of incomplete or modified incom- 
plete decomposition have been proposed in the literature. 
These variants are designed to reduce the total compu- 
tational work, to improve the performance on vector or 
parallel computers, or to handle special problems. One 
could, for instance, think of incomplete variants of the 
various LU-decomposition algorithnis discussed in Golub 
and Van Loan (1996, Chapter 4.4). 


We will describe some of the more popular variants and 
give references to where more details can be found for other 
variants. 

A natural approach is to allow more fill-in in the LU 
factor (that is a larger set S). Several possibilities have been 
proposed. The most obvious variant is to allow more fill-ins 
in specific locations in the LU factors, for example, allowing 
more nonzero bands in the L and U matrices (that is larger 
stencils) (see Axelsson and Barker, 1984; Gustafsson, 1978; 
Meijerink and- van der Vorst, 1981). The most common 
location-based criterion is to allow a set number of levels 
of fill-in, where original entries have level zero, original 
zeros have level oo, and a fill-in in position (i, j) has level 
determined by 


Level; = min {Level + Level,; + 1} 

l<ksming, j) 
In the case of simple discretizations of partial differential 
equations, this gives a simple pattern for incomplete factor- 
izations with different levels of fill-in. For example, if the 
matrix is from a 5-point discretization of the Laplacian in 
two dimensions, level 1 fill-in will give the original pattern 
plus a diagonal inside the outermost band (for instance, see 
Meijerink and van der Vorst, 1981). 

The other main criterion for deciding which entries 
to omit is to replace the drop-by-position strategy by a 
drop-by-size one; that is, a fill-in entry is discarded if its 
absolute value is below a certain threshold value. This 
drop-tolerance strategy was first proposed by Munksgaard 
(1980). For the regular problems just mentioned, it is inter- 
esting that the level fill-in and drop strategies give a some- 
what similar incomplete factorization because the numer- 
ical value of successive fill-in levels decreases markedly, 
reflecting the characteristic decay in the entries in the fac- 
tors of the LU decomposition of A. For general problems, 
however, the two strategies can be significantly different. 
Since it is usually not known a priori how many entries 
will be above a selected threshold, the dropping strategy is 
normally combined with restricting the number of fill-ins 
allowed in each column (Saad, 1994). When using a thresh- 
old criterion, it is possible to change it dynamically during 
the factorization to attempt to achieve a target density of 
the factors (Axelsson and Munksgaard, 1983; Munksgaard, 
1980; Saad, 1996) gives a very good overview of these 
techniques. 

Although the notation is not yet fully standardized, the 
nomenclature commonly adopted for incomplete factoriza- 
tions is ILU(k), when k levels of fill-in are allowed, and 
ILUT(a, f) for the threshold criterion when entries of mod- 
ulus less than œ are dropped and the maximum number 
of fill-ins allowed in any column is f. There are many 
variations on these strategies and the criteria are sometimes 
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combined. In some cases, constraining the row sums of the 
incomplete factorization to match those of the matrix, as in 
MILU, can help (Gustafsson, 1978), but as we noted ear- 
lier, successful application of this technique is restricted to 
cases in which the solution of the (preconditioned) system 
is rather smooth. 

Shifts can be introduced to prevent breakdown of the 
incomplete factorization process. As we have seen, incom- 
plete decompositions exist for general M-matrices. It is 
well known that they may not exist if the matrix is 
positive-definite, but does not have the M-matrix property. 
Manteuffel (1980) considered Incomplete Cholesky factor- 
izations of diagonally shifted matrices. He proved that if A 
is symmetric positive-definite, then there exists a constant 
a > 0, such that the Incomplete Cholesky factorization of 
A +al exists. Since we make an incomplete factorization 
for A+ al, instead of A, it is not necessarily the case that 
this factorization is also efficient as a preconditioner, the 
only purpose of the shift is to avoid breakdown of the 
decomposition process. Whether there exist suitable val- 
ues for a such that the preconditioner exists and is efficient 
is a matter of trial and error. 

Another point of concern is that for non-M-matrices the 
incomplete factors of A may be very ill conditioned. For 
instance, it has been demonstrated in van der Vorst (1981) 
that if A comes from a 5-point finite difference discretiza- 
tion of Au + Btu, +u,) = f, then for sufficiently large B, 
the incomplete LU factors may be very ill conditioned even 
though A has a very modest condition number. Remedies 
for reducing the condition numbers of L and U have been 
discussed in Elman (1989) and van der Vorst (1981). 


5.3 Hybrid techniques 


In the classical incomplete decompositions, one ignores 
fill-in right from the start of the decomposition process. 
However, it might be a good idea to delay this until 
the matrix becomes too dense. This leads to a hybrid 
combination of direct and iterative techniques. One of such 
approaches has been described in Bomhof and van der Vorst 
(2000); we will describe it here in some detail. 

We first permute the given matrix of the linear system 
Ax = b to a doubly bordered block diagonal form: 


Ag. 0 xm 0 . Aon 
0 Aq > : Aim 
A=PlAP=| 2: + , 0 
O >e 0O Anm- 
Ano Ami one eee mm 


(10) 
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Of course, the parallelism in the eventual method depends 
on the value of m, and some problems lend themselves 
more to this than others. Many circuit-simulation problems 
can be rewritten in an effective way, as a circuit is often 
composed of components that are only locally coupled to 
others. 

We permute the right-hand side b as well to b = PTB, 
which leads to the system 


AX = b (11) 


with x = Px. 

The parts of b and X that correspond to the block 
ordering will be denoted by b, and ¥;. The first step in the 
(parallelizable) algorithm will be to eliminate the unknown 
parts Zy,- -s Zm- Which is done by the Algorithm in 
Figure 9. 

Note that S in Figure 9 denotes the Schur complement 
after the elimination of the blocks 0,1,...,m—1. In 
many relevant situations, direct solution of the reduced 
system Sx,, = Ym requires the dominating part of the total 
computational costs, and this is where we bring in the 
iterative component of the algorithm. 

The next step is to construct a preconditioner for the 
reduced system. This is based on discarding small elements 
in S. The elements larger than some threshold value define 
the preconditioner C: 

ee fë if Jsi > tisyl or Isl > tis,y| (12) 
H 0 elsewhere 


with a parameter 0 <r < 1. In the experiments reported 
in Bomhof and van der Vorst (2000), the value t = 0.02 
turned out to be satisfactory, but this may need some 
experimentation for specific problems. 

When we take C as the preconditioner, then we have to 
solve systems like Cv = w, and this requires decomposition 


m 
Parallel_for /=0,1,....m~1 
Decompose Ay: Aj= Li Uj 
Lm = Am; UT 
Um= Li Am 
yi = Ly" by 
S; = Lmi Uim 
Z = Lm Y; 
end ee 
S= Arm- Xi-0 S; 
Ym= bn- Lino Zi 
Solve SXm = Ym 
Parallel_for i= 0, 1, ..., m-1 
s = Uz" (Yi— Uinkm) 


en 


Figure 9. Parallel elimination. 


of C. In order to prevent too much fill-in, reordering of C 
with a minimum degree ordering is suggested. The system 
SX, = Ym 1S then solved with, for instance, GMRES with 
preconditioner C. For the examples described in Bomhof 
and van der Vorst (2000), it turns out that the convergence 
of GMRES was not very sensitive to the choice of t. The 
preconditioned iterative solution approach for the reduced 
system also offers opportunities for parallelism, although 
in Bomhof and van der Vorst (2000) it is shown that even 
in serial mode, the iterative solution (too sufficiently high 
precision) is often more efficient than direct solution of the 
reduced system. 

In Bomhof and van der Vorst (2000), heuristics are 
described for the decision on when the switch from direct 
to iterative should take place. These heuristics are based on 
mild assumptions on the speed of convergence of GMRES. 
The paper also reports on a number of experiments for 
linear systems, not only from circuit simulation but also 
for some matrix problems taken from Matrix Market [5]. 
These experiments indicate that attractive savings in com- 
putational costs can be achieved, even in serial computation 
mode. 


5.4 Element-by-element preconditioners 


In finite element problems, it is not always possible or 
sensible to assemble the entire matrix, and it is as easy 
to form products of the matrix with vectors as when it is 
held in assembled form. Furthermore, it is easy to distribute 
such matrix multiplications to exploit parallelism. Hence, 
preconditioners are required that can be constructed at the 
element level. Hughes et al. (1983) were the first to propose. 
such element-by-element preconditioners. 
A parallel variant is suggested in Gustafsson and 
Lindskog (1986). For symmetric positive-definite A, they 
decompose each element matrix A, as A, =L,LI, and 
construct the preconditioner as K = LLT, with 


In this approach, nonadjacent elements can be treated in 
parallel. An overview and discussion of parallel element- 
by-element preconditioners is given in van Gijzen (1994). 
To our knowledge, the effectiveness of element-by-element 
preconditioners is limited, in the sense that it does not often 
give a substantial improvement of the CPU time. 


5.5 Preconditioning by blocks or domains 


Other preconditioners that use direct methods are those 
where the direct method, or an incomplete version of it, 


is used to solve a subproblem of the original problem. This 
can be done in domain decomposition, where problems 
on subdomains can be solved by a direct method, but 
the interaction between the subproblems is handled by an 
iterative technique. 

Domain decomposition methods were motivated by par- 
allel computing, but it now appears that the approach can 
also be used with success for the construction of global 
preconditioners. This is usually done for linear systems 
that arise from the discretization of a PDE. The idea is 
to split the given domain into subdomains, and to compute 
an approximation for the solution on each subdomain. If 
all connections between subdomains are ignored, this then 
leads to a Block Jacobi preconditioner. Chan and Goovaerts 
(1990) showed that the domain decomposition approach can 
actually lead to improved convergence rates, at least when 
the number of subdomains is not too large. This is because 
of the well-known divide-and-conquer effect when applied 
to methods with superlinear complexity such as ILU: it is 
more efficient to apply such methods to smaller problems 
and piece the global solution together. 

In order to make the preconditioner more successful, 
one has to couple the domains, that is, one has to find 
proper boundary conditions along the interior boundaries of 
the subdomains. From a linear algebra point of view, this 
amounts to adapting the diagonal blocks in order to com- 
pensate for the neglected off-diagonal blocks. This is only 
successful if the matrix comes from a PDE problem and if 
certain smoothness conditions on the solution are assumed. 
If, for instance, the solution were constant, then one could 
remove the off-diagonal block entries, adding them to the 
diagonal block entries without changing the solution. Like- 
wise, if the solution is assumed to be fairly smooth along 
a domain interface, one might expect this technique of 
diagonal block correction to be effective. Domain decompo- 
sition is used in an iterative fashion and usually the interior 
boundary conditions (in matrix language: the corrections 
to diagonal blocks) are based upon information from the 
approximate solutions on the neighboring subdomains that 
are available from a previous iteration step. 


6 METHODS FOR THE COMPLETE 
EIGENPROBLEM 


6.1 Introduction 


Unlike the situation for linear systems solving, there are no 
truly direct methods for the solution of the eigenproblem, in 
the sense that, in general, one cannot compute the eigenval- 
ues (or eigenvectors) exactly in a finite number of floating 
point operations. The iterative methods come in two differ- 
ent classes, one in which the matrix is driven to diagonal 
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form, or Schur form, by rotations, and one in which this is 
accomplished by detecting invariant subspaces. This sec- 
ond class of methods is explicitly based on the Power 
Method. The QR method is the most prominent member 
of this class; it converges so fast that the complete eigen- 
system of a matrix can be computed in a modest (but matrix 
dependent) factor times n? floating point operations. This is 
somewhat similar to the situation for the direct methods for 
solving dense linear systems. The QR method is, for this 
reason, often referred to as a direct method, in order to dis- 
tinguish it from the essentially iterative subspace projection 
techniques. These iterative subspace projection techniques 
attempt to detect partial eigeninformation in much less 
than O(n?) work. They will be described in forthcoming 
sections. 

Because the Power Method is important as a driving 
mechanism, although in hidden form, in the fast ‘direct’ 
methods as well as in the subspace projection methods, 
we will discuss it in some more detail. The reader should 
keep in mind, however, that the Power Method is seldom 
competitive as a stand-alone method; we need it here for 
a better understanding of the more superior techniques to 
come. 

The Power Method is based on the observation that 
if we multiply a given vector v by the matrix A, then 
each eigenvector component in v is multiplied by the 
corresponding eigenvalue of A. 

Assume that A is real symmetric, then it has real eigen- 
values and a complete set of orthonormal eigenvectors 


Ax, = yxy, lekl = 1 (k= 1,2,.-.,7) 


We further assume that the largest eigenvalue in modulus 
is single and that 


Dl > Dyl 2 


Now, suppose we are given a vector v,, which can be 
expressed in terms of the eigenvectors as v) = 3); Y;xX;, and 
we assume that y, #0 (that means that v} has a nonzero 
component in the direction of the eigenvector corresponding 


to 2). 
Given this v,, we compute Av,,A(Av),..., and it 
follows for the Rayleigh quotient of these vectors that 
T 
w; Aw, 
lim a L= 1 
j>% wjwj 


where w; = Aity]. 

The Power Method can be represented by the template 
given in Figure 10. The sequence 6 converges, under the 
above assumptions, to the dominant (in absolute value) 
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v= vı/ljville 

for j=1, 2, ..., until convergence 
Vi = AV 
=v Viet 
V= Vise fllle 

end 


Figure 10. The Power Method for symmetric A. 


eigenvalue A. The scaling of the iteration vectors is neces- 
sary in order to prevent overflow or underflow. This scaling 
cau be done a little bit cheaper by taking the maximum 
element of the vector instead of the norm. In that case, 
the maximum element of v; converges to the largest (in 
absolute value) eigenvalue of A. 

It is not hard to see why the Rayleigh quotients converge 
to the dominant eigenvalue. We first write w ; in terms of 
the eigenvectors of A: 


Hence, 


p SARAAT 
Ti 
wi Aw, YI + Lise Vi (5) 
a hy ate: 
vit Disa Vi (=) 
1 
Xiz Yi Paj A) 
PEE 
Diet Y? Paja) 


with P,(t) = (t/2,)!. 

For unsymmetric matrices, the situation is slightly more 
complicated, since A does not necessarily have an orthonor- 
mal set of eigenvectors, Let A = XJX~! denote the reduc- 
tion to Jordan form, then A/y = XJ/X~1y, If the largest 
eigenvalue, in modulus, is real and simple, then we see that 
this value will dominate, and by similar arguments as above, 
we see convergence in the direction of the corresponding 
column of X. If the largest eigenvalue is complex, and if A 
is real, then there is a conjugate eigenpair. If there is only 
one eigenpair of the same maximal modulus, then the vec- 
tor A/y will ultimately have an oscillatory behavior and it 
can be shown (see Wilkinson, 1965, p. 579) that a combi- 
nation of two successive vectors in the sequence A/v will 
converge to a subspace spanned by the two conjugate eigen- 
vectors. The two eigenvectors can then be recovered by a 
least-squares solution approach. For matrices with elemen- 
tary divisors, one can also show that the Power Method will 


converge, although very slowly, to the eigenvector associ- 
ated with the eigenvalue of the divisor. 

The Power Method, as a ‘stand-alone method,’ received 
quite some attention until the early 1970s as the only means 
to extract eigenvalue information from large matrices, Early 
attempts to exploit the information hidden in a successive 
number of iterands, as, for instance, by Aitken acceleration, 
were only partially successful. In particular, the method of 
Lanczos (1950), which exploited all iterands of the Power 
Method, was a suspect for a long time, mainly because of 
its not-well-understood hehavior in nonexact arithmetic. We 
will come back to this later. 

If systems with the shifted matrix, like 


(A-—ol)x=b 


can be solved efficiently, then it is very profitable to use 
the Power Method with shifts. If one has a good guess o 
for the eigenvalue that one is interested in, then one can 
apply the Power Method with (A —o/)~!. Note that it is 
not necessary to compute this inverted matrix explicitly. In 
the computation of the next vector v,,, = (A — ol ly, 
the vector v,,, can be solved from 


(A -olv = 0; 


Assume that the largest eigenvalue (in absolute value) is 
,, and the second largest is X,, and that o is close to i. 
The speed of convergence now depends on the ratio 


[hy — 0] 
[hg — of 


and this ratio may be a good deal smaller than |X,|/|d,|- 
Even when the solution of a shifted linear system is signifi- 
cantly more expensive than a matrix-vector multiplication 
with A, the much faster convergence may easily pay off for 
these additional costs. 

We may update the shift as better approximations become 
available during the iteration process; that is, when we 
apply the algorithm in Figure 10 with (A —o)~! at step 
i, then 6 is an approximation for 1/(4, — 0). This means 
that the approximation for 4, becomes o + (1/0) and 
we can use this value as the shift for iteration i + 1. 
This technique is known as Rayleigh Quotient iteration. 
Its convergence is ultimately cubic for symmetric matrices 
and quadratic for unsymmetric systems. 


6.2 The QR method 


We have already seen that for complex conjugate pairs, it 
is necessary to work with three successive vectors in the 


Power iteration. This suggests that it may be a good idea 
to start with a block of vectors right from the start. So, let 
us assume that-we start with a set of independent vectors 
uo = [¥,,uz,...,4,] and that we carry out the Power 


Method with U, which leads to the computation of 
UP = aur 


per iteration. If we do this in a straightforward manner, 
then this will lead to unsatisfactory results because each of 
the columns of uf is effectively used as a starting vector 
for a single-vector Power Method, and all these single- 
vector processes will tend to converge toward the dominant 
eigenvector(s). This will make the columns of uP highly 
dependent in the course of the iteration. It is therefore a 
better idea to try to maintain better numerical independence 
between these columns and the most common technique for 
this is to make them orthonormal after each multiplication 
with A. This leads to the algorithm in Figure 11. 

The columns of U? converge to a basis of an invariant 
subspace of dimension k under the assumption that the 
largest k (in absolute value) eigenvalues (counted according 
to multiplicity) are separated from the remainder of the 
spectrum. This can easily be seen from the same arguments 
as for the Power Method. If the eigenvalues are real, and 
the matrix is real, then the eigenvalues appear along the 
diagonal of R. For complex eigenvalues, the situation is 
slightly different; we will come back to this later. 

In order to simplify the derivation of the QR method, we 
will first assume that A has real eigenvalues, which helps 
in avoiding complex arithmetic in the computation of the 
Schur forms. 

We can apply the Orthogonal Iteration Method with a 
full set of starting vectors, say U® = J, in order to try 
to reduce A to some convenient form. The matrix U in 
the orthogonal subspace iteration converges to the matrix 
of Schur vectors, and the matrix U®” AU® converges to 
Schur form. After one step of the algorithm: AU = A = 
UMR®, we can compute Ay = U®" AU, which is a 
similarity transform of the matrix A, and, hopefully, already 
a little bit more in the direction of Schur form than A itself. 


start with orthonormal Uf” 
for j=1, ..., until convergence 
V= aut) 
orthonormalize the columns of Vg: 
V= QkRk 
u =Q 
end 


Figure 11. The Orthogonal Iteration Method. 
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Therefore, it might be a better idea to continue the algorithm 
with 44. 

Let us define Q, = U®. The matrix A, can be computed 
as 


A= Q7AQ, a RQ, 


which simply reverses the factors of Ag = A. 

The next step in the Orthogonal Iteration Method is 
to compute AQ, and factor this as U ØR, but since 
A = Q,R, we see that 


AQ, = ORO, 
If we now factor R® Q, as 

RQ, = O,R® (13) 
then we see that 

AQ, =U®R® (14) 


with U® = Q,Q,. Obviously, Q, gives the orthogonal 
transformation that corrects Q; to U ®. From equation (14), 
we see that 


RQ, = A, = OF AQ, = Q,R® 


and hence the correction factor Q, would have also been 
obtained if we had continued the iteration with A,. This 
is a very nice observation because this correction factor 
drives the matrix A, again further to Schur form, and we 
can repeat this in an iterative manner. 

The nice consequence of working with the transformed 
matrices A, is that we can compute the correction matrix 
Q;,, from the matrix product of the reverted factors Q; 
and R® from the previous iteration, as we see from (13). 
This leads to the famous QR iteration. 

The matrix A, converges to Schur form, and the product 
Q,9,-+-Q; of the correction matrices converges to the 
matrix of Schur vectors corresponding to A (remember 
that we are still assuming that all eigenvalues are real, in 
order to avoid complex arithmetic. We could have dropped 
this assumption, but then we should have used complex 
arithmetic; this can be avoided, however). 

We can also apply the QR iteration with shifts, and this 
has very important consequences. Suppose we apply in 
one particular step of the algorithm a shift © close to an 
eigenvalue \;; that is, we do a QR factorization for A;_, = 
ol. If o is close enough to an eigenvalue ,, then this 
implies that the matrix A,;_, — oJ, which is similar to A — 
of, is almost singular. This means that the matrix R® must 
be close to singular (since Q; is orthogonal). If we do the 
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QR factorization with pivoting, in order to get the smallest 
diagonal element at the bottom position, then this means 
that one of the columns of Q; must be close to the smallest 
singular vector of A,_, — OI, or better that that column 
is almost an eigenvector of A;_,, and consequently, o is 
almost an eigenvalue, and that is what we are looking for. 

Pivoting is usually not necessary. If a subdiagonal entry 
of A; is very small, then the eigenvalue computation can 
proceed for the two separate subblocks of A,. If it is not 
small, then apparently the first n — 1 columns of A, are 
independent (recall that A; is of upper Hessenberg form) 
so that (near) dependence must show up through a small 
diagonal entry at the bottom position in R®. 

The speed of convergence with which the smallest diag- 
onal entry of Ri goes to zero, for increasing i, is propor- 
tional to |j}; — o|/|X, — of, if à; is the eigenvalue closest 
to the shift and 4,,, denotes the eigenvalue that is second 
closest. This is precisely the speed of convergence for the 
shift-and-invert Power Method and the surprising result is 
that we get the speed of shift and invert without inverting 
anything. A real shift can be easily incorporated in the QR 
algorithm. If we apply a shift in the ith iteration, then we 
have to QR-factor A;_; — of, as 


Ai- — of => Q,R® 


i 
and, because of the orthogonality of Q,, it follows that 
OF (Ai — 01)Q; = RQ; 


However, in the next step, we want to continue with A; 
(and possibly apply another shift), but that matrix is easily 
computed, since 


A; = OF A;_1Q; = R® Q, +01 


This leads to the QR iteration with shifts, as given in 
Figure 12. 

We have ignored some important aspects. One of these 
is that we have assumed that the real matrix, to which 
we applied the QR method, had real eigenvalues. This 
was necessary for the convergence of the matrix A;_, 
(which is similar to A) to Schur form. For more general 
situations, that is, when the real matrix has conjugate pairs 
of eigenvalues (and eigenvectors), it can be shown that 
in real arithmetic the matrix converges to generalized real 
Schur form. 

Recall that a matrix is in generalized real Schur form if 
it is block upper triangular and if the diagonal blocks are 
either 2 x 2 or one-dimensional. The 2 x 2 blocks along 
the diagonal of the generalized Schur form represent the 
complex conjugate eigenpairs of A (and the corresponding 


start with Ay =A 

for j= 1, ..., until convergence 
factor Aj_, - o = QRO 
compute 4;= RO Q+ oy! 

end 


Figure 12. The QR method with shifts 9). 


eigenvectors can be computed by combining the corre- 
sponding columns of the accumulated transformation matri- 
ces Q;). 

This leaves the question how to apply complex shifts, 
since that would lead to complex arithmetic. It can be 
shown that by combining two successive steps in the QR 
algorithm, a pair of complex conjugate shifts can be used, 
without explicit complex arithmetic. A pair of such shifts is 
meaningful because complex eigenvalues of a real matrix 
always appear in conjugate pairs. If we want to accelerate 
convergence for one complex eigenpair, then clearly we 
also have to treat the conjugate member of this pair. 

An important aspect of the QR algorithm is the compu- 
tational complexity. If we apply it in the explained form, 
then it is quite expensive because we have to work with 
dense matrices in each iteration. Also, we have, for instance, 
to compute dense QR factorizations. These costs can be 
significantly reduced by first bringing the given matrix A, 
by orthogonal transformations, to upper Hessenberg form: 
H = Q” AQ. Note that this is equivalent to the symmetric 
tridiagonal form if A is symmetric. The transformation to 
these forms can be done by Householder reflections or by 
Givens rotations (the Householder reflections are ‘generally 
preferred since they are cheaper). 

Another important observation is that the matrix Q; can 
be easily computed for the shifted matrix, which leads to 
an implicit shift strategy. 

For a detailed coverage of the QR method, see Golub 
and Van Loan (1996), Demmel (1997), and Watkins (1991, 
1993, 2000). 

The costs for the QR method, for computing all eigen- 
values and eigenvectors, of a real unsymmetric matrix, are 
in the order of n? arithmetic operations. A crude and con- 
servative upper estimate is 257° flops, including the initial 
reduction of A to upper Hessenberg form. This reduces to 
about 10n? if only the eigenvalues are desired. In that case 
it is not necessary to accumulate the Q,’s. In the symmetric 
case, the costs are reduced by roughly a factor of 2. These 
costs make the method only feasible for matrices on the 
order of a few thousands at most. For larger matrices, the 
method is used in combination with a technique that first 
reduces the large matrices to much smaller matrices (while 
attempting to preserve the main characteristics of the large 
matrix, in particular, the wanted eigenpairs). 


Resuming: The QR method for eigenvalues and eigen- 
vectors of a dense matrix is so effective because of the 
following ingredients: 


1. Orthogonalization of the iterated columns in the basic 
orthogonal simultaneous iteration. 

2. Accumulation of the orthogonal transformations, and 

their use as similarity transformations (this leads to the 

QR-RQ step). 

The effect of shifts and their efficient implementation. 

4, The initial reduction of A to upper Hessenberg (tridi- 
agonal) form. 


w 


The QR method has been implemented in the major sub- 
routine libraries, in particular, in LAPACK, ScaLAPACK, 
IMSL, NAG, and MATLAB. 


6.3 Generalized problems: the QZ method 
The generalized eigenproblem 
Ax = XBx (15) 


with A, B e C”*”, can be reduced to several canonical 
forms. Let us suppose that B is nonsingular, then we can 
define y = Bx, so that the eigenproblem (15) reduces to 
the standard eigenproblem 


AB™'y =y (16) 


However, working with AB! may be very unattractive, 
in particular when B is near singular, Also, the matrix 
AB-! may be very nonnormal or (close to) defective. 
This makes the resulting eigenproblem highly sensitive to 
perturbations. Finally, with this approach, we cannot solve 
important classes of problems for which B is singular. 

Moler and Stewart (1973) proved the interesting result 
that both A and B can be reduced to Schur form, albeit 
with two different unitary matrices Q and Z: 


AZ=QR‘4, BZ=QR® 


with Rô and RB upper triangular matrices. This leads to 
the so-called QZ method, 

For a detailed description of the implementation of the 
QZ algorithm, see Moler and Stewart (1973) or Golub and 
Van Loan (1996, Chapter 7.7). 

The total costs of the QZ algorithm, for the computation 
of the eigenvalues ouly, are about 307? flops. If one wants 
the eigenvectors as well, then the final Q and Z need to 
be evaluated (for the eigenvalues only, the products of all 
intermediate orthogonal transformations are not explicitly 
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necessary), which takes another 16n? for Q and about 20n? 
for Z (Golub and Van Loan, 1996, Chapter 7.7.7). 

The (generalized) Schur form R4 —2R® reveals the 
eigen structure of the original eigenproblem A — XB. The 
pairs of diagonal elements of Rô and R? define the eigen- 
values ^. Here, we have to be careful; it is tempting to 
take the ratios of these diagonal elements, and, in view of 
the foregoing presentation, this is certainly correct. How- 
ever, it sometimes hides sensitivity issues. Stewart (1978) 
has pointed at the asymmetry in the usual presentation of 
the generalized eigenproblem with respect to the role of 
A and B. Instead of the form Ax —XBx = 0, we might 
also have considered pAx — Bx = 0, which is equivalent 
for u = X~!. If, for instance, B is singular, then the sec- 
ond form leads to the conclusion that there is an eigenvalue 
u = 0, which is equivalent to concluding that the first form 
has an eigenvalue à = oo. The important observation is 
that, in this case, in rounding arithmetic, there will be a 
tiny diagonal element rf, in R®, so that the corresponding 
ratio rÀ / rB leads to a very inaccurate eigenvalue approx- 
imation for ^, whereas the inverse leads to an accurate 
approximation for an eigenvalue p. 

The standard presentation for the generalized eigenprob- 
lem is quite appropriate for the type of problems that arise, 
for instance, in mechanical engineering, where A represents 
the stiffness matrix, with information of the differential 
operator, and B represents the mass matrix, which is largely 
determined by the chosen basis functions. For a well-chosen 
set of basis functions, the mass matrix is usually well- 
conditioned and then it is quite natural to consider the 
eigenvalue problem from the point of view of A. 


7 ITERATIVE METHODS FOR 
THE EIGENPROBLEM 


7.1 The Arnoldi method 


With the Power method (see 6.1), we have generated the 
spanning vectors of a Krylov subspace 


K™(A; v) = span{v,, Av,,..., Amy} 


However, note that the Power method exploits only the 
two most recently computed vectors for the computation of 
an approximating eigenvalue. The methods of Lanczos and 
Arnoldi exploit the whole Krylov subspace, and this leads 
not only to better approximations for the largest eigenvalue 
but also for many other eigenvalues. 

From the definition of a Krylov subspace, it is clear that 


K” (wA +BI; v) =K"(A; v) for a 40 (1D 
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This says that the Krylov subspace is spanned by the same 
basis if A is scaled and/or shifted. The implication is that 
Krylov subspace methods for the spectrum of the matrix A 
are invariant under translations for A. 

Krylov subspaces play a central role in iterative methods 
for eigenvalue computations. In order to identify better 
solutions in the Krylov subspace, we need a suitable basis 
for this subspace, one that can be extended inductively for 
subspaces of increasing dimension. The obvious basis v,, 
Av,,...,A™-v, for K"(A; v) is not very attractive from 
a numerical point of view, since the vectors A/ vı point 
more and more in the direction of the dominant eigenvector 
for increasing j (the power method!), and hence the basis 
vectors become dependent in finite precision arithmetic. 

Lanczos (1950) proposes to generate an orthogonal basis 
V1, U2,..+,U_ for the Krylov subspace of dimension m, 
and shows that this could be done in a very economic way 
for symmetric A. Arnoldi (1951) describes a procedure for 
the computation of an orthonormal basis for unsymmetric 
matrices, and we have seen this procedure in Figure 1. The 
Lanczos algorithm follows as a special case. 

The orthogonalization in Figure 1 leads to relations 
between the v, that can be formulated in a compact alge- 
braic form. Let V, denote the matrix with columns v, up 
to Vj, V = [wi] lv], then it follows that 


AV m- = Vin Hmm- (18) 


The m x (m — 1) matrix H,, m—ı is upper Hessenberg, and 
its elements h, j are defined by the Arnoldi orthogonaliza- 
tion algorithm. We will refer to this matrix Hm m] as the 
reduced matrix A, or, more precisely, the matrix A reduced 
to the current Krylov subspace. From a computational point 
of view, this construction is composed from three basic 
elements: a matrix—vector product with A, inner products, 
and updates. We see that this orthogonalization becomes 
increasingly expensive for increasing dimension of the sub- 
space, since the computation of each h; j Tequires an inner 
product and a vector update. 


7.2 The Ritz values and Ritz vectors 
The eigenvalues of the leading m x m part Hmm Of the 
matrix Hn+im are called the Ritz values of A with respect 
to the Krylov subspace of dimension m. We will see later, 
in our discussions on the Lanczos and Arnoldi methods, 
how they can be related to eigenvalues of A. If s is an 
eigenvector of Hy m» then y = V,,s is called a Ritz vector 
of A. Also, these Ritz vectors can be related to eigenvectors 
of A. 

All this is not so surprising because of the relation with 
the Krylov subspace and the vectors generated in the Power 


Method. It seems obvious that we can base our expectations 
of the convergence of the Ritz values on our knowledge 
of convergence for the Power Method, something like, 
the largest Ritz value will converge faster to the largest 
eigenvalue of A than the Rayleigh quotient in the Power 
Method. But this kind of attitude is not very helpful for 
deeper insight. There is a very essential difference between 
Krylov subspace methods and the Power method and that 
is that the Krylov subspace is invariant for scaling of A 
and for shifting of A, that is, A and aA+fIJ generate 
exactly the same Krylov subspace. The reduced matrix 
Ham is simply scaled and shifted to the reduced matrix 
Bum = OH, m + Blip: associated with A = aA + BI. With 
In, we denote the m x m identity matrix. 

This implies that for the Krylov subspace method, the 
notion of largest (in absolute value) eigenvalue loses its 
special meaning as opposed to the Power Method. Since the 
Krylov methods are shift invariant, the position of the origin 
in the spectrum is not relevant, and we rather make the 
distinction between exterior and interior eigenvalues. The 
Krylov methods have no special preference for eigenvalues 
that are at about the same distance from the center of the 
spectrum, provided that these eigenvalues are about equally 
well separated from the others. In particular, when the 
real spectrum of a symmetric real matrix is symmetrically 
distributed with respect to (A, + ,,)/2, ^; being the largest 
eigenvalue of A and i, the smallest one, then, for a starting 
vector that also has a symmetric weight distribution with 
respect to the corresponding eigenvectors, the convergence 
of the smallest Ritz value toward i, will be equally fast (or 
slow) as the convergence of the largest Ritz value to i,. 

For complex spectra, one has to consider the smallest cir- 
cle that encloses all eigenvalues. With proper assumptions 
about the starting vector, one may expect that the eigenval- 
ues close to this circle will be approximated fastest and that 
the more interior eigenvalues will be approximated later in 
the Kryloy process (that is, for larger values of m). For 
more general starting vectors, this describes more or less 
the generic case, but with special starting vectors, one can 
force convergence toward favored parts of the spectrum. 

The Arnoldi method forms the basis for the ARPACK 
software, described in Lehoucq et al. (1998). For more 
information on the method and for references, we refer to 
Bai et al. (2000). 


7.3 The Lanczos method 


Note that if A is symmetric, then so is 


Haim = Vin-14V m-i 


so that in this situation H,,_1 ,—1 is tridiagonal: 


k AV not = Vin Tmt (19) 
The matrix Tp m- is an m x (m — 1) tridiagonal matrix, 
its leading (m — 1) x (m — 1) part is symmetric. 

This means that in the orthogonalization process, each 
new vector has to be orthogonalized with respect to the 
previous two vectors only, since all other inner products 
vanish. The resulting three-term recurrence relation for the 
basis vectors of K(A; v,) is the kernel of the Lanczos 
method and some very elegant methods are derived from 
it. A template for the Lanczos method is given in the 
Algorithm shown in Figure 13. 

In the symmetric case, the orthogonalization process 
involves constant arithmetical costs per iteration step: one 
matrix-vector product, two inner products, and two vector 
updates. In exact arithmetic, this process must terminate 
after at most n steps, since then the n orthonormal vectors 
for the Krylov subspace span the whole space. In fact, 
the process terminates after k steps if the starting vector 
has components only in the directions of eigenvectors 
corresponding to k different eigenvalues. In the case of finite 
termination, we have reduced the matrix A to triangular 
form with respect to an invariant subspace, and, in fact, the 
Lanczos’ algorithm was initially viewed as a finite reduction 
algorithm for A. 

For more information on the Lanczos method, we refer 
to Bai et al. (2000), which also gives pointers to software. 
The implementation of the Lanczos method is not trivial 
because of the effects of finite precision arithmetic that may 
lead to multiple copies of detected cigenpairs. 


7.4 The two-sided Lanczos method: Bi-Lanczos 


If the matrix A is Hermitian, then we can, at modest com- 
putational costs per iteration step, generate an orthonormal 
basis for the Krylov subspace K” (A; v,). It suffices to put 
each newly generated vector Av, orthogonal to the two pre- 
vious basis vectors v, and v,_, only. This leads to a reduced 


Figure 13. The Lanczos algorithm. 
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matrix that is symmetric tridiagonal. A theoretical result by 
Faber and Manteuffel (1984) shows that this is not possible 
for general unsymmetric matrices. 

There exists a generalization of the Lanczos method for 
unsymmetric matrices. This method is known as the two- 
sided Lanczos method or the Bi-Lanczos method. There 
are several ways to derive this method; we follow the one 
in Wilkinson (1965, Chapter 6.36). 

For ease of notation, we will restrict ourselves to the real 
case. For an unsymmetric matrix A € R"**, we will try 
to obtain a suitable nonorthogonal basis with a three-term 
recurrence, by requiring that this basis is orthogonal with 
respect to some other basis. 

We start from two Arnoldi-like recursions for the basis 
vectors of two Krylov subspaces, one with A:K™(A; v) 
and the other with AT: K” (AT; w). This leads to 


i 
jaro = AY; = Dts 


i=l 
j 
— T — = le 
Bajja = Aw; — D8, ji 
i=] 


and we require that v,,; is orthogonal to all previous w; 


and that w;,, is orthogonal to all previous v;. Clearly, 
this defines, apart from the constants h; +J and By 41, p the 
vectors vjų; and w,,1, once the previous vectors are given. 
Then we have that 


T = 
AV, = Visi ad AW; = WG yay 


Since each new v,,, is only orthogonal with respect to the 
w;, for i < j, and likewise for w,,, with respect to the v;, 
it follows that 


Taye 
WeVj=L,, and VFW = Kj; 


Jj 


where L, ; and Ķ; ; are lower triangular. Clearly, 
T a 
Kij = Ljy 


so that both matrices are diagonal. Let us denote this matrix 
as D. Then, we have that 


T = 
WAV, = DH; ; 
and also 
TAT ey 
V; AW, = DG; ; 


This shows that H, ; and G, ; must be tridiagonal. The sets 
{v;} and {w;} are called biorthogonal. 
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Select a normalized pair v4, w; (for instance, w4 = v4) 
such that w} v, =ô; #0 
Bo=0, w= Y= 0 
for j=1, 2, .... 
p= Avj- Bi 4Vj-1 
o; = Wi p/8; 


Vind = PI Yi 
Wit = (ATW — BypaWpa— U W) Yit 
Sja = Mii 
Bi =Y; 8; 
end; 


Figure 14. The two-sided Lanczos algorithm. 


In Figure 14, we show schematically an algorithm for the 
two-sided Lanczos method, suitable for execution with an 
unsymmetric matrix A. 

In this algorithm, the vectors v; are normalized: {jv,||, = 
1, which defines automatically the scaling factors for the w,. 

We have tacitly assumed that the inner products why; # 
0, since the value 0 would lead to an unsolicited breakdown 
of the method if v; and w; are not equal to zero. If v; is 
equal to zero, then we have an invariant subspace, and, in 
that case, the eigenvalues of T; ; are eigenvalues of A. 

For more information on this method and for pointers to 
software, we refer to Bai et al. (2000). 


7.5 The Jacobi—Davidson method 


The Jacobi-Davidson method is based on the projection of 
the given eigenproblem Ax = dx on a subspace that is in 
general not a Krylov subspace. The idea to work with non- 
Krylov subspaces was promoted in Davidson (1975). This 
leads to an approximate eigenpair, just as in the Arnoldi 
method. The main difference is the way in which the 
subspace is expanded. This is done by following an idea 
that was originally proposed by Jacobi (1846). Suppose that 
we have a normalized eigenvector approximation u, with 
corresponding eigenvalue approximation @,, then the idea 
is to try to compute the correction f, so that 


Alu; +t) = Au; +t) 


for t L u; and à close to @;. It can be shown that f satisfies 
the so-called correction equation 


(I — ujuz)(A — 0, — uu )t = —(Au; — Oju) 


This leads to the so-called Jacobi-Davidson method, pro- 
posed by Sleijpen and van der Vorst (1996). 


Start with t= vg, starting guess 
for m= 1,... 
fori=1,..,m-1 
tat-(vjty 
Vn tAlt le VA = AVm 
fori=i1,..,m-14 


of the m by m matrix M, (|[Sll = 1) 
u= Vs with V= [v4 -s Vm] 

uô = VAs with VA = [vf, n VA] 
r=u^- 0u 7 

if ([[rlle$€), 4=9, x= u, then STOP. 
Solve (approximately) a tL u from 
(I= uu") (A- @f (I- uu") t = -r 


Figure 15. Jacobi-Davidson Algorithm for dmx (A). 


The basic form of this algorithm is given in Figure 15. In 
each iteration of this algorithm, an approximated eigenpair 
(8, u) for the eigenpair of the matrix A, corresponding to 
the largest eigenvalue (in absolute value) of A, is computed. 
The iteration process is terminated as soon as the norm of 
the residual Au — 9u is below a given threshold e€. 

To apply this algorithm, we need to specify a starting vec- 
tor vo, and a tolerance e. On completion, an approximation 
for the largest eigenvalue (in absolute value) X = ^mar (A) 
and its corresponding eigenvector x = Xmax is delivered. 
The computed eigenpair (Ñ, X) satisfies || AX — Z|] < €. 

The method is particularly attractive, that is fast con- 
verging, if one is able to solve the correction equation in 


some approximation at relatively low costs. For exact solu- , 


tion, one is rewarded with cubic convergence, but that may 
come at the price of an expensive solution of the correction 
equation. In practical applications, it is not necessary to 
solve this equation exactly. Effective solution requires the 
availability of a good preconditioner for the matrix A — 8,7. 
For more details on this, and for the efficient inclusion of 
the preconditioner, we refer to Bai et al. (2000). 


NOTES 


[1] The A-norm is defined by |lylI, = (y, y); = O, Ay). 

[2] The presentation in this chapter has partial overlap 
with Dongarra et al. (1998). 

[3] Ideally, the eigenvalues of K~!A should“cluster aro- 
und 1. 

[4] The nonsingular matrix A is called an M-matrix if 
all its off-diagonal elements are nonpositive and if all 
elements of A`! are nonnegative. 


[5] Collection of testmatrices available at 
ftp://ftp.cise.ufl.edu/cis/tech-reports/ 
tr98/tr98-016.ps 


REFERENCES 


Anderson E, Bai Z, Bischof C, Demmel J, Dongarra J, DuCroz J, 
Greenbaum A, Hammarling S, McKenney A, Ostrouchov S 
and Sorensen D. LAPACK User’s Guide. SIAM: Philadelphia, 
1992. 


Amoldi WE. The principle of minimized iteration in the solution 
of the matrix eigenproblem. Q. Appl. Math. 1951; 9:17-29. 


Axelsson O. Iterative Solution Methods. Cambridge University 
Press: Cambridge, 1994. 


Axelsson O and Barker VA. Finite Element Solution of Boundary 
Value Problems. Theory and Computation. Academic Press: 
New York, 1984. 


Axelsson O and Munksgaard N. Analysis of incomplete fac- 
torizations with fixed storage allocation. In Preconditioning 
Methods ~ Theory and Applications, Evans D (ed.). Gordon & 
Breach: New York, 1983; 265-293. 


Bai Z, Demmel J, Dongarra J, Ruhe A and van der Vorst H. 
Templates for the Solution of Algebraic Eigenvalue Problems: 
A Practical Guide. SLAM: Philadelphia, 2000. 


Barrett R, Berry M, Chan T, Demmel J, Donato J, Dongarra J, 
Eijkhout V, Pozo R, Romine C and van der Vorst H. Templates 
for the Solution of Linear Systems: Building Blocks for Iterative 
Methods. SIAM: Philadelphia, 1994. 


Benzi M. Preconditioning techniques for large linear systems: a 
survey. J. Comput. Phys. 2002; 182:418-477. 


Bomhof CW and van der Vorst HA. A parallel linear system 
solver for circuit-simulation problems. Numer. Lin. Alg. Appl. 
2000; 7:649-665. 


Brezinski C. Projection Methods for Systems of Equations. North 
Holland: Amsterdam, 1997. 


Bruaset AM. A Survey of Preconditioned Iterative Methods. Lon- 
gman Scientific & Technical: Harlow, 1995. 


Chan TF and Goovaerts D. A note on the efficiency of domain 
decomposed incomplete factorizations. SIAM J. Sci. Statist. 
Comput. 1990; 11:794-803, 


Chan TF and van der Vorst HA. Approximate and incomplete 
factorizations. In Parallel Numerical Algorithms, ICASE/LaRC 
Interdisciplinary Series in Science and Engineering, Keyes DE, 
Sameh A, Venkatakrishnan V (eds). Kluwer: Dordrecht, 1997; 
167-202. 


Davidson ER, The iterative calculation of a few of the lowest 
eigenvalues and corresponding eigenvectors of large real sym- 
metric matrices. J. Comput. Phys. 1975; 17:87-94. 


Demmel JW. Applied Numerical Linear Algebra. SLAM: Phil- 
adelphia, 1997. 


Doi S. On parallelism and convergence of incomplete LU factor- 
izations. Appl. Numer. Math. 1991; 7:417-436. 


Doi S and Hoshi A. Large numbered multicolor MILU precondi- 
tioning on SX-3/14. Int. J. Comput. Math. 1992; 44:143-152. 


Linear Algebraic Solvers and Eigenvalue Analysis 575 


Dongarra JJ, Duff IS, Sorensen DC and van der Vorst HA. 
Numerical Linear Algebra for High-Performance Computers. 
SIAM: Philadelphia, 1998. 


Duff IS and Meurant GA. The effect of ordering on precondi- 
tioned conjugate gradient. BIT 1989; 29:635-657. 


Duff IS and van der Vorst HA. Developments and trends in the 
parallel solution of linear systems. Parallel Comput. 1999; 
25:1931-1970. 


Dupont T, Kendall RP and Rachford HH Jr. An approximate fac- 
torization procedure for solving self-adjoint elliptic difference 
equations. SIAM J. Numer. Anal. 1968; 5(3):559-573. 


Eijkhout V. Beware of unperturbed modified incomplete point 
factorizations. In Iterative Methods in Linear Algebra, 
Beauwens R, de Groen P (eds). North Holland: Amsterdam, 
1992; 583-591; IMACS Int. Symp., Brussels, 2—4 April 1991. 


Elman HC. Relaxed and stabilized incomplete factorizations for 
non-self-adjoint linear systems. BIT 1989; 29:890~915. 


Faber V and Manteuffel TA. Necessary and sufficient conditions 
for the existence of a conjugate gradient method. SIAM J. 
Numer. Anal. 1984, 21(2):352—362. 


Fischer B. Polynomial based iteration methods for symmetric lin- 
ear systems. Advances in Numerical Mathematics. Wiley and 
Teubner: Chichester, Stuttgart, 1996. 


Fletcher R. Conjugate Gradient Methods for Indefinite Systems, 
Volume 506 of Lecture Notes Math. Springer-Verlag: Berlin- 
Heidelberg-New York, 1976, 73-89. 


Golub GH and Van Loan CF. Matrix Computations. The Johns 
Hopkins University Press: Baltimore, 1996. 


Golub GH and van der Vorst HA. Numerical progress in eigen- 
value computation in the 20th century. J. Comput. Appl. Math. 
2000; 123(1-2):35-65. 


Greenbaum A. Iterative Methods for Solving Linear Systems. 
SIAM: Philadelphia, 1997. 


Gustafsson I. A class of first order factorization methods. BIT 
1978; 18:142-156. 


Gustafsson I and Lindskog G. A preconditioning technique based 
on element matrix factorizations, J. Comput. Methods Appl. 
Mech. Eng. 1986; 55:201-220. 


Hackbusch W. Iterative Solution of Large Sparse Systems of Equa- 
tions. Springer-Verlag: Berlin, 1994. 


Hughes TJR, Levit I and Winget J. An element-by-element solu- 
tion algorithm for problems of structural and solid mechanics. 
J. Comput. Methods Appl. Mech. Eng. 1983; 36:241—254. 


Jacobi CGJ. Ueber ein leichtes Verfahren, die in der Theorie 
der Sicularstérungen vorkommenden Gleichungen numerisch 
aufzulösen. J. Reine Angew. Math. 1846; 30:51-94. 


Jones MT and Plassmann PE. The efficient parallel iterative solu- 
tion of large sparse linear systems. In Graph Theory and Sparse 
Matrix Computations, IMA Vol. 56, George A, Gilbert JR, 
Liu JWH (eds). Springer-Verlag: Berlin, 1994; 229-245. 

Kuo JCC and Chan TF. Two-color Fourier analysis of iterative 
algorithms for elliptic problems with red/black ordering. SIAM 
J. Sci. Statist. Comput. 1990; 11:767-793. 


Lanczos C. An iteration method for the solution of the eigenvalue 
problem of linear differential and integral operators. J. Res. 
Natl. Bur. Stand. 1950, 45:225-280. 


576 Linear Algebraic Solvers and Eigenvalue Analysis 


Lanczos C. Solution of systems of linear equations by minimized 
iterations. J. Res. Natl. Bur. Stand. 1952; 49:33-53. 


Lehoucq RB, Sorensen DC and Yang C. ARPACK User’s Guide. 
SIAM: Philadelphia, 1998. 


Magolu monga Made M and van der Vorst HA. A general- 
ized domain decomposition paradigm for parallel incomplete 
LU factorization preconditionings. Future Gen, Comput. Syst. 
2001a; 17:925-932. 


Magolu monga Made M and van der Vorst HA. Parallel incom- 
plete factorizations with pseudo-overlapped subdomains. Par- 
allel Comput. 2001b; 27:989--1008. 


Magolu monga Made M and van der Vorst HA. Spectral analy- 
sis of parallel incomplete factorizations with implicit pseudo- 
overlap. Numer. Lin. Alg. Appl. 2002; 9:45-64. 


Manteuffel TA. An incomplete factorization technique for positive 
definite linear systems. Math. Comp. 1980; 31:473 —497. 


Meijerink JA and van der Vorst HA. An iterative solution method 
for linear systems of which the coefficient matrix is a symmetric 
M-matrix. Math. Comp. 1977; 31:148-162. 


Meijerink JA and van der Vorst HA. Guidelines for the usage of 
incomplete decompositions in solving sets of linear equations 
as they occur in practical problems. J. Comput. Phys. 1981; 
44:134-155. 


Meurant G. Numerical Experiments for the Preconditioned Conju- 
gate Gradient Method on the CRAY X-MP/2. Technical Report 
LBL-18023, University of California: Berkeley, 1984. 


Meurant G. Computer Solution of Large Linear Systems. North 
Holland: Amsterdam, 1999, 


Moler CB and Stewart GW. An algorithm for generalized matrix 
eigenvalue problems. SIAM J. Numer. Anal. 1973; 10:241-—256. 


Munksgaard N. Solving sparse symmetric sets of linear equations 
by preconditioned conjugate gradient method. ACM Trans. 
Math. Softw. 1980; 6:206-219, 


Paige CC and Saunders MA. Solution of sparse indefinite systems 
of linear equations. SITAM J. Numer. Anal. 1975; 12:617-629. 


Parlett BN. The Symmetric Eigenvalue Problem. Prentice Hall: 
Englewood Cliffs, 1980. 


Saad Y. A flexible inner-outer preconditioned GMRES algorithm. 
SIAM J. Sci. Comput. 1993; 14:461-469, 


Saad Y. ILUT: A dual threshold incomplete LU factorization. 
Numer. Lin. Alg. Appl. 1994; 1:387—402. 


Saad Y. Iterative Methods for Sparse Linear Systems. PWS Pub- 
lishing Company: Boston, 1996, 


Saad Y and Schultz MH. GMRES: a generalized minimal residual 
algorithm for solving nonsymmetric linear systems. SIAM J. Sci. 
Statist. Comput. 1986; 7:856—869. 


Saad Y and van der Vorst HA. Iterative solution of linear sys- 
tems in the 20th century. J. Comput. Appl. Math. 2000; 
123(1-2):1-33. 

Sleijpen GLG and van der Vorst HA. A Jacobi-Davidson iteration 
method for linear eigenvalue problems. SIAM J. Matrix Anal. 
Appl. 1996; 17:401-425, 


Sonneveld P. CGS: a fast Lanczos-type solver for nonsymmetric 
linear systems. SIAM J. Sci. Statist, Comput. 1989; 10:36-52. 


Stewart GW. Perturbation theory for the definite generalized 
eigenvalue problem. In Recent Advances in Numerical Anal- 


ysis, de Boor C, Golub GH (eds). Academic Press: New York, 
1978; 193-206. 


Stewart GW. Matrix Algorithms, Vol. I: Basic Decompositions. 
SIAM: Philadelphia, 1998. 


Stewart GW. Matrix Algorithms, Vol. II: Eigensystems. SIAM: 
Philadelphia, 2001. 


van der Vorst HA. Iterative solution methods for certain sparse 
linear systems with a non-symmetric matrix arising from PDE- 
problems. J. Comput. Phys. 1981; 44:1-19. 


van der Vorst HA. Large tridiagonal and block tridiagonal linear 
systems on vector and parallel computers. Parallel Comput. 
1987; 5:45-54. 


van der Vorst HA. High performance preconditioning. SIAM J. 
Sci. Statist. Comput. 1989; 10:1174-1185. 


van der Vorst HA. Bi-CGSTAB: A fast and smoothly converging 
variant of Bi-CG for the solution of non-symmetric linear 
systems. SIAM J. Sci. Statist. Comput. 1992; 13:631-644. 


van der Vorst HA. Computational methods for large eigenvalue 
problems. In Handbook of Numerical Analysis, vol. VIIL, Cia- 
tlet PG, Lions JL, (eds), North Holland: Amsterdam, 2002; 
3-179. 


van der Vorst HA. Iterative Krylov Methods for Large Linear 
Systems. Cambridge University Press: Cambridge, 2003. 


van der Vorst HA and Vuik C. GMRESR: A family of nested 
GMRES methods. Numer. Lin. Alg. Appl. 1994; 1:369-386. 


van Gijzen MB, Iterative Solution Methods for Linear Equations 
in Finite Element Computations. PhD thesis, Delft University 
of Technology, Delft, 1994. 


Watkins DS. Fundamentals of Matrix Computations. John Wiley 
& Sons: New York, 1991. 


Watkins DS. Some perspectives on the eigenvalue problem. SIAM 
Rev. 1993; 35:430-471. 


Watkins DS. QR-like algorithms for eigenvalue problems. J. Com- 
put. Appl. Math. 2000; 123:67-83. 


Wilkinson JH. The Algebraic Eigenvalue Problem. Clarendon 
Press: Oxford, 1965. 


Chapter 20 


Multigrid Methods for FEM and BEM Applications 


Wolfgang Hackbusch 


Max-Planck-Institut fiir Mathematik in den Naturwissenschaften, Inselstr., Leipzig, Germany 


1 General Remarks on Multigrid Methods 577 
2 Two-Grid Iteration 581 
3 Multigrid Method 584 
4 Application to Finite Element Equations 586 
5 Additive Variant 589 
6 Nested Iteration 590 
7 Nonlinear Equations 592 
8 Eigenvalue Problems 593 
9 Applications to the Boundary Element 

Method (BEM) 593 
References 595 
Further Reading 595 


1 GENERAL REMARKS ON MULTIGRID 
METHODS 


1.1 Introduction 


The solution of large systems of linear or even nonlin- 
ear equations is a basic problem when partial differential 
equations are discretized. Examples are the finite element 
equations in continuum or fluid dynamic problems. Since 
the dimension of these systems is often only limited by 
the available computer storage, the size of the arising sys- 
tems is increasing because of the advances in computer 
technology. At the moment, systems with several millions 
of equations are of interest. The solution of such systems 


Encyclopedia of Computational Mechanics, Edited by Erwin 
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requires numerical methods that have a runtime propor- 
tional to the dimension of the system (so-called ‘linear 
complexity’ or ‘optimal complexity’). Immediately when 
computers started to be available, one tried to find more 
efficient solvers. Multigrid methods happened to be the first 
ones that reached the linear complexity for a rather large 
class of problems. 

Because of its naming, multigrid methods involve several 
‘grids’. The nature of this grid hierarchy is explained below 
in a general setting. A simple one-dimensional example can 
be found in Section 1.2.5. 


1.1.1 The standard problem structure 


The situation we consider in the following is illustrated in 
diagram (1): i 


P = Poontinuous 


{ discretization process (i) 


Praiscrete = Pe <> Pe- °° i= Py 


Priserete 18 a given (discrete i.e. finitely dimensional) alge- 
braic problem. The most prominent example of such a 
problem is a system of linear equations. Therefore, the 
discussion of the corresponding linear multigrid methods 
will fill the major part of this contribution. However, from 
the practical point of view, nonlinear systems may be a 
more important example of Paiscrete: Other examples are 
eigenvalue problems Pasce. It is essential for the multi- 
grid approach to embed the discrete problem Paiscrete Into a 
hierarchy of discrete problems Pg, Pz_1,---+ Po, Where the 
problem size (dimension) of P, is increasing with increas- 
ing level number k. The largest dimension corresponds 
to Priscrete = Po while the lower dimensional problems 
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Pei»: -» Po are used as auxiliary problems. The role of 
the continuous problem P = Pyoninuous Will be discussed 
below. 


1.1.2 What are multigrid methods? 


The multigrid method can be considered as a concept for 
the construction of fast iterative methods based on a hier- 
archy Py, Pii» <- -» Po for solving an algebraic problem 
Paiscrete = Py- To have an analogue, we may look at the 
finite element method (FEM). The FEM is not a special dis- 
cretization of a particular problem but offers a whole class 
of discretizations (elements of different shape and order) 
to various problems (differential as well as integral equa- 
tions). Similarly, the multigrid technique is applicable to 
large classes of (discrete algebraic) problems and describes 
how to construct algorithms for solving these problems. 
Instead of ‘multigrid method’ other names are also in use, 
for example, ‘multiple grid method’. The name states that 
several ‘grids’ are involved, which belong to different lev- 
els of a hierarchy, as indicated in (1). This also explains the 
alternative namings ‘multilevel method’ or ‘multiple level 
method’. The word ‘grid’ in ‘multigrid method’ originates 
from finite difference approximations by means of a regular 
grid. However, this does not mean that the multigrid method 
is restricted to such discretizations. In fact, the major part of 
this contribution will refer to the finite element discretiza- 
tion. Finite differences, finite elements or finite volumes are 
examples for the discretization process in (1). 


1.1.3 Which problems can be solved? 


The continuous problem P = Prontinous CAN be connected 
with a partial differential or integral equation of elliptic 
type. 

Usually, the problems Pasce tO be solved by multi- 
grid are discrete analogues of a continuous problem P = 
P ontinuous derived by some discretization indicated by | in 

1). 


A given discretization process offers an easy possibility 
to obtain not only one particular discrete problem P but 
a whole hierarchy of many discrete analogues P, (k = 
£,£—1,...,0) of the continuous problem corresponding 
to different dimensions. This hierarchy is needed in the 
multigrid iteration. 

The essence of the multigrid method is the so-called 
‘coarse grid correction’. This requires the existence of a 
lower dimensional problem P;_,, which approximates the 
given problem P, in a reasonable way. If the discretization 
process is not given in advance, it must be created as a part 
of the multigrid process. 

On the other hand, it is often very hard to apply multigrid, 
when the desired solution cannot be represented by a lower 


dimensional approximation. Examples are boundary value 
problems with a geometrically complicated boundary (see 
end of Section 1.1.6) 


1.1.4 Why is multigrid optimal? 


If a multigrid method is successful, the iterative process has 
the following characteristics: 


(i) The convergence speed is uniform with respect to 
the problem size. Therefore, differently from simple 
iterative methods, which become slower with increas- 
ing dimension, these algorithms can be used for large 
scale problems. The desired accuracy can be obtained 
by a fixed number of multigrid steps. The resulting 
method is of linear complexity. 

(ii) Owing to the hopefully good approximation of prob- 
lem P, by P;,_;, the convergence speed is expected to 
be much better (i.e. smaller) than 1. This implies that 
only very few iteration steps are necessary to obtain 
the desired accuracy. 


The characteristics given above describe the aim we want 
to obtain with a good multigrid implementation. In fact, 
optimal convergence can be guaranteed in rather general 
cases, Here, ‘general’ means that no special algebraic prop- 
erties of the matrix are required. In particular, the matrix 
may be neither symmetric nor positive definite. 

Nevertheless, there is no easy way to get such results in 
any case. For singularly perturbed problems (e.g. problems 


with high Reynolds numbers) the convergence speed p = . 


p(8) may depend on a parameter ò. If (8) approaches 
1 in the limit § > 0 (or 8 > oo), the fast convergence 
promised in (ii) is lost. These difficulties give rise to the 
investigation of the so-called robust multigrid methods, 
which by definition lead to uniform convergence rates p(8). 


1.1.5 Is multigrid easy to implement? 


As seen above, multigrid methods require an environ- 
ment of a hierarchy consisting of the problems P, (k = 
£, £ — 1, ..., 0) together with interacting mappings between 
neighboring levels k, k — 1 (these interactions are denoted 
by <=; in (1). If this environment is present in the imple- 
mentation anyway (e.g. as a part of the adaptive refinement 
process), the multigrid method is rather easy to implement. 
If, however, only the problem description Of Priccrete ÍS 
given, the auxiliary problems P, for k < £ together with 
the interactions 4—5 must be installed as a part of the 
multigrid method, which of course makes the multigrid 
implementation much more involved. 


1.1.6 The hierarchy of problems 


There are different possibilities to produce a hierarchy 
of discretizations. For discretization methods based on 
regular grids, the hierarchy of grids induces the neces- 
sary hierarchy of discrete problems P,. In the case of 
FEMs, the underlying triangulation replaces the grid. A 
hierarchy of nested triangulations yields a perfect prob- 
lem hierarchy. Such a hierarchy may be the side-product 
of adaptive mesh refinement. In this case, one proceeds 
from the coarsest to the finest level. The problem P} 
corresponding to the coarsest mesh size should have a 
dimension small enough to be solved by standard methods. 
This can cause severe difficulties for complicated bound- 
ary value problems that need a high number of degrees 
of freedom (see e.g. Hackbusch and Sauter, 1997 and 
Section 3.4). 


1.1.7 Notations 


The norm ||-|| is the Euclidean norm when applied to 
vectors and the spectral norm for matrices (i.e. ||A{] := 
max {|JAx|| / ixl :x 4 0}. (-, -), denotes the Euclidean sca- 
lar product of the vector space U, (see below). The scalar 
product in L?(Q) is denoted by (f, 8)z3@) = So fg dx. 


Multigrid Methods for FEM and BEM Applications 579 


The Landau symbol O(f(x)) means that the quantity is 
bounded by C * f(x) for some constant C, when x tends 
to its characteristic limit (e.g. x = h — 0 for the step size 
h or x =n — oo for the dimension n). Further notations 
are explained below. 


1.1.8 Literature 


The first two-grid method is described by Brakhage (1960) 
for the solution of an integral equation (see Section 9.1). 
The first two-grid method for the Poisson equation is 
introduced by Fedorenko (1961), while Fedorenko (1964) 
contains the first multi-grid method. The first more general 
convergence analysis is given by Bakhvalov (1966). 

Since there is a vast literature about multigrid methods, 
we do not try to give a selection of papers. Instead we refer 
to the monographs Hackbusch (1985), Wesseling (1991) 
(in particular devoted to problems of fluid dynamics), 
Bramble and Zhang (1993), Trottenberg, Oosterlee and 
Schüller (2001) (and the literature cited therein), and to 
the proceedings of the European Multigrid Conferences 
edited by Hackbusch and Trottenberg (1982) (containing 
an introduction to multigrid), Hackbusch and Trottenberg 
(1986), Hackbusch and Trottenberg (1991), Hemker and 
Wesseling (1994), and Hackbusch and Wittum (1998). 


Item Explanation Reference 

d; Defect L,u, — f (15) 

f, vector of right-hand side (at level k) (3) 

hy grid size, mesh size Section 1.2.1 
H,, HEE CH finite element space C energy space Section 4, (33) 


k,£ level index Section 1.2.1 

L; stiffness matrix of level k (3), Section 4.1, Section 4.2 
M,,M7™, iteration matrix Section 2.3 

p prolongation (5) 

P, P; bijection onto FE space (of level k) Section 4.1, Section 4.2 
r restriction (6) 

S, iteration matrix corresponding to S; Section 2.1 

Sq, St smoothing iteration, v-fold application of S, Section 1.2.2, Section 2.2 
Th triangulation (at level k) Section 4.1, Section 4.2 
u, 1% solution vector (at level k) (3), Section 4.2 

uk finite element function at level k from Hy Section 4.3 

U, Uy linear space of vectors u, u, (5), Section 4.1 

Y y = L V-cycle, y = 2: W-cycle (27) 

dag dag = 0 for a = B, Sag = 1 otherwise Kronecker symbol 

0.5, contraction number Section 2.3, (50) 

et) spectral radius of a matrix Section 2.3 

Q domain of boundary value problem Section 4 

Qy grid of level k, set of nodal values Section 1.2.5 
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1.2 Ingredients of multigrid iterations 


1.2.1 Hierarchy of linear systems 


We consider the solution of a linear system 
L,u, = fp (2) 


where h refers to the smallest mesh size h =h, of a 
hierarchy of mesh sizes hy > hy > +++ > hy_, > hy. We 
write the linear system corresponding to the level k (mesh 
size h,) as 


Ly, = fy fork = £,£—1,...,0 (3) 


In particular, L,u, = f; is an identical writing for (2). 


1.2.2 Smoothing iteration 


Classical iterative methods like the Jacobi or the 
Gauss-Seidel iteration are needed for all levels k > 0. The 
purpose is not to reduce the iteration error but to produce 
smoother errors. This is the reason for the name ‘smoothing 
iteration’ or ‘smoother’. The smoother is applied to the old 
iterate ug! and the right-hand side f, and produces the new 
iterate u few = S, (ud, £,). 

To gie an example, we mention the simplest smoother, 
the Richardson iteration, which may be chosen in several 
cases, 


Sp: (Ug, fp) > Ug — A i (L,u, — fp) (4) 


1.2.3 Prolongations 


We denote the space of vectors u, and f, by U,. The 
prolongation p, ,_, is a linear transfer from U4,_, to Up, 


Pr k-i Uk- >U, linear (5) 
In the following, we omit the indices k, k — 1 and write 


PUp > Up 


1.2.4 Restrictions 


The restriction r,_,, is a linear transfer in the opposite 
direction, 


Ty k Ug > Up linear (6) 


Again, we appreviate r,_,, by r. 


1.2.5 A one-dimensional example 


Consider the 1D Dirichlet boundary value problem 


—u"(x) = f(x) in Q= (0,1) 
u(x) =0 atx erT = {0,1} (7) 


Given a mesh size h = 1/(n + 1) with n being a power of 
2, we define the grid ©, consisting of the points x, = vh 
(v =1,...,2). The hierarchy of step sizes is 

tg>hy>hy>--->he=h withh,=2'* (8) 
Note that hy = 1/2 is the largest possible step size. The 
bape) of grid points in Q, = {x, = vhp: 1 £ v < n} is 

= 21+* _ 1 (see Figure 1). 

he standard difference method yields the equations 
hy? [-u, 1 + 2u, — upp] = fOh) for v=l,...,n 
with boundary values uy =u,,,, =0. The matrices L, 
from (3) are 


L, = h}? tridiag{—1, 2,—1} (k=0,...,0 (9) 


of size n, X nz, while the vectors are u, = (u,)"*_, f = 
(Fh) ) 

The prolongation p = p; ,_; from (5) can be realized by 
piecewise linear interpolation: 


(p Yk) @) 
v) if x € Q_, CQ, 
= | aa h) taath o. 
es ee ee if x E€ QAQ 
(10) 
Qor 7 1 


0 T T Po T i ot I lI 
E oe 
T ye 
Figure 1. Grid hierarchy. 
coarse 
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Figure 2. Prolongation and restriction in the 1D case. 


(see Figure 2). p is described by the left rectangular matrix 
of size n, X ny_, in 


1 
2 
5 e EE T 2 
p= 2 
1 
2) E 
L21 


(11) 
T2 


The restriction r = r,_,, from (6) is defined by the 
following weighted mean value 
Hv — hy) + 2v,(x) +V + A,)] 
for x € Qai (12) 


(rY) x) = 


(see Figure 2). The corresponding rectangular matrix of size 
ngi X Ny is the second in (11). 

A suitable pmootieri is Jacobi’s iteration damped by the 
factor (1/2): ujt = &, (ul, f,) with 


Sele fp) = uy, — FDE* Lu, — fp) (13) 


where D, is the diagonal of L,, that is, D, = 2,71 (cf. 
(9)). Note that S, from (13) is almost identical to (4), since 
OALA ~ 0/9h?. 


2 TWO-GRID ITERATION 


The two-grid iteration is a preversion of the multigrid 
iteration. It is not of practical interest, but it is an important 
building block that involves only the fine grid and one 
coarse grid. 

In the following, we explain the so-called smoothing 
effect of certain classical iterations. This gives rise to the 
smoothing iteration mentioned in Section 1.2.2. 


2.1 Smoothing effect 


Usually, the purpose of an iteration uy œ> uj > ++: > u% 
is the fast convergence to the true solution, that is, that 
the error uj—u, decreases quickly. The purpose of a 
smoothing iteration is different. The error u} — u, may stay 
as large as the starting error u? — u,, but it must become 


smoother (note that the error uj — u, must become smooth; 
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this does not concern the smoothness of the solution u,). 
The details are exemplified for the 1D example from 
Section 1.2.5. 

We consider the damped Jacobi iteration (13) since its 
analysis is the most transparent one. Since Dr! = (1/Dh?I, 
the damped Jacobi iteration equals 


ult! = ul — oh? (Lp u, — fp) = Spa + of?f, 
with @ =} (14) 


where S, = I — wh?L, is called the iteration matrix. Note 
that the eigenvectors of L, are 


ef = 2h, (sinu Th.) 


(u: frequency). The corresponding nae of L, are 
eu = 4h,” sin (uh,/2), that is, Ly ek =, peg. Since 
S$, =I—wh2?L, is the iteration matrix of the damped 
iteration (14), it has the same eigenvectors ej as L, and 
(for w = (1/4)) the eigenvalues 
2 (with, 
=") 


h 
hy = 1— sin? (=) = 


(l<psm=hy'-1) 


for p =1,...,m, 


shown in Figure 3. The choice œ = (1/2) yields the stan- 
dard Jacobi (dashed line in Figure 3). 

The rate of convergence of the damped Jacobi iteration is 
hy = cos*(h,/2) = 1 — (1/4) x72 + O (hg), proving the 
very slow convergence of the Jacobi iteration. 

Even though iteration (14) converges very slowly, Fig- 
ure 3 shows that components ef with frequency p > 
1/(2h,) are reduced at least by a factor (1/2) per itera- 
tion. This means that the convergence rate of the damped 
Jacobi iteration restricted to the subspace span {e}: 1/2 < 
wh, < 1} of the high frequencies is 1/2. The iteration is 
rapidly convergent with respect to the high frequencies. 
The slow convergence is caused by the lower frequencies 


-1+ 


Figure 3. Eigenvalues of the iteration matrix as function of 
frequency uhe. 
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only. The construction of the multigrid iteration is based 
on this observation. 

The initial error uf —u, can be represented by the 
linear combination Daek After v steps of the damped 
Jacobi iteration, the error equals uy —u, = 5 p ek with 
Bu = %,A),. The preceding consideration shows B, ~ ap 
for low frequencies but |B | < |a,| for high frequencies. 
This fact can be expressed by saying that the error uy — 
u, is smoother than u? —u,. The first three graphs of 
Figure 4 illustrate the increasing smoothness of uj — u, 
(v =0,1,2). This is why we say that the iteration (14) 


serves as a smoothing iteration. 


2.2 Structure of two-grid iterations 


The foregoing subsection showed that an appropriate smoo- 
thing iteration is quite an efficient method for reducing 
the high-frequency components. Convergence is only lack- 
ing with respect to low frequencies (smooth components). 
Therefore, one should combine this iteration with a sec- 
ond one having complementary properties. In particular, the 
second iteration should reduce the smooth error part very 
well. Such a complementary iteration can be constructed 
by means of the coarse grid with step size h,_, = 2h, (see 
Figure 1). 

Let ug be some given approximation to u, = Lyfe, 
which serves as starting value. Few steps of the smoothing 
iteration S, (cf. (13)) will result in an intermediate value 
uy. From the previous subsection, we know that the error 
v: =, —u, is smooth (more precisely, smoother than 
ugd — u,). v, can also be regarded as the exact correction 
since u, = W, — v; represents the exact solution. Inserting 
U, into the equation L; u; — f; = 0, we obtain the defect 


dq = LT -f, (15) 


of ti,, which vanishes if and only if q; is the exact solution 
u,. Because of L; v; = L; u, —L,u, = L, 0, — f; = de, 


Figure 4. Smoothing effect of the damped Jacobi iteration. 
ug: exact discrete solution; u?, ul, u?: smoothing iterates; u3: 
result after coarse grid correction. 


the exact correction v, is the solution of 
L,v, = 4d, - (16) 


Equation (16) is of the same form as the original equa- 
tion L; u, = f,. Solving L; v, = d; exactly is as difficult 
as solving L,u, =f,. Nonetheless, v, can be approxi- 
mated better than u,, since v; is smooth and smooth 
functions can be represented well by means of coarser 
grids. 

To approximate the problem L, v, = d, by a coarse grid 
equation 7 


Lei Ven = de; (17) 


we have to choose d,_, reasonably. Note that the matrix L, 
is already defined by (9) for all levels k > 0, especially for 
k = £ — 1. The right-hand side d,_, should depend linearly 
on the original defect d. This is where the restriction r 
from (6) is needed, 


de; =r d; (18) 


Having defined d,_,, we obtain v,_, = L;_ dy_, as the 
exact solution of (17). We expect v,_, to be an approxi- 
mation of the exact correction v,. However, v,_; is only 
defined on the coarse grid Q,_,. We have to interpolate this 
coarse grid function by 


Ve = P (19) 


where the prolongation p is announced in (5) (see Fig- 
ure 2). 

Since u, = Ñ; — v, is the exact solution and ¥, = p Vg; 
is supposed to approximate v,, one tries to improve the 
value U, by 


ut” = Hy —¥, (20) 


The step from uç to uj” by (15)-(20) is called the coarse 
grid correction. Combining the separate parts (15)-—(20), 
we obtain the compact formula 


a, — Ñ, — p L74 r (LT — fẹ) (21) 


for the coarse grid correction. 

Figure 4 shows the errors ui, — u, after i = 0, 1,2 dam- 
ped Jacobi iterations. The coarse grid correction applied 
to UW, = ut yields w. The graph of the error w} — u; 
in Figure 4 proves the success of the coarse grid cor- 
rection. Although the coarse grid correction seems to 
be efficient, it cannot be used as an iteration by itself 
because it does not converge. It is the combination of 


smoothing iteration and coarse grid correction that is 
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Two-grid iteration for solving L, u, = f; 


Start: uw given iterate 
Smoothing step: 
a, = Sui, f,) (v smoothing steps) (a) 
Coarse grid correction: 22) 
d; := L,u, — f; (calculation of the defect) (b;) 
d,_,:=rd, (restriction of the defect) (b2) 
V = Le de (solution of coarse grid eq.) (b3) 
uw)? =D, — pve (correction of U,) ba) 


rapidly convergent, whereas both components by them- 
selves converge slowly or not at all. The combination is 
called the two-grid iteration since two levels £ and £ — 1 
are involved. 

We summarize the two-grid iteration in (22). In Step (a), 
we use the notation S? (uj, f,) for the v-fold application of 
the smoothing iteration S, (ui, f,). 

The number v of smoothing iterations can be chosen 
independently of the grid size h,. Its influence on conver- 
gence will be described in Remark 1. 

Below, the two-grid iteration (22) is written in a quasi- 

ALGOL style. 
The function TGM performs one iteration step at level k 
(first parameter). The third parameter f is the right-hand 
side f, of the equation to be solved. The input value of 
the second parameter u is the given jth iterate mt that is 
mapped into the output value TGM = ai, The second 
line ‘if £ =0 then ...’ is added to have a well-defined 
algorithm for ali levels k > 0. Note that TGM = TGM™ 
depends on the choice of v. 


Iteration (22) can be regarded as the prototype of a two- 
grid method. However, there are many variants. Instead 
of applying first smoothing and thereafter the coarse grid 
correction, we can interchange these parts. More generally, 
v, smoothing iterations can be performed before and v, 
iterations after the coarse grid correction. The resulting 
algorithm is given below in (23). 


2.3 Two-grid convergence in the 1D case 


In the case of the one-dimensional model problem from 
Section 1.2.5, a discrete Fourier analysis can be applied. 
The explicit calculation can be found in Hackbusch (1985) 
and in Section 10.3 Hackbusch (1994). Here we give only 
the results. 

The iteration matrix M, =M}°(y) of the two-grid 
method (22) is defined by u/*’ = M,u/ + N,f, and equals 


Mi) = A- pL,_,rL,)8} 


function TG Mk, u, f); 


variant TGM™ 


begin if k =0 then TGM := Lj! «f else 


begin u := Stu, f); 


(a’) 


d:=rx(L,*«u—f); (b12) (22') 


vi= Ly, *d; 
TGM :=u— pv (b4) 


end end; 


(bs) 


function TGM(k, u, f); 


begin if k = 0 then TGM := Lo! «f else 


begin u := Sp (u, 0; 


u:= u — p * L7}, ere (L; xu — f); 


TGM := Sp u, Ñ) 
end end; 


variant TGM 2) 


(pre-smoothing) (a) 
(coarse grid correction) (b) 
(postsmoothing) (c) 


(23) 
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where S, = I — (1/4)h3L, is the iteration matrix of the 
smoothing procedure from Section 2.1. The parameter v 
of M7OM(y) shows the explicit dependence on the number 
of smoothing steps. 

In the following, we discuss the spectral radius and the 
spectral norm of M, = M]™(v). 

The spectral radius p(M,) is defined as the maximal 
value || for all eigenvalues a of M,. A linear iteration 
converges (i.e. uf —> u, for all starting values u?) if and 
only if p(M,) <1. The value of p(M,) describes the 
asymptotic rate of convergence. 

If we are interested in the uniform rate, that is, ||u 
wl < glu — ||, the best contraction number ¢ is given 
by the spectral norm ¢ = |/M,|| . While p(M;) is the appro- 
priate description for a larger number of iteration steps, the 
contraction number § = |M, || is needed if we perform only 
very few steps of an iteration. 


eo ae 
d 


Theorem 1. Let the two-grid iteration of level £ be defined 
by (22) with v > 1. Then the spectral radius of the iteration 
matrix M, = MĮ®™{(v) is bounded (uniformly for all £ > 
0) by 
(MZ (v)) < p, = max {§(1 ~ §)” + (1 — £)” 
0<§<4}<1 (24) 


hence, convergence follows. The spectral norm is bounded 
(uniformly in £) by 


ME (vy |] <¢, = max {VE — &) + (1 — £) 


0<§ <j} <1 (25) 


Special values and the asymptotic behavior are given in 


¥ Py y 

1 1/2 1/2 

2 1/4 1/4 

3 1/8 0.150 

4 0.0832 0.1159 

5 0.0671 0.0947 

10 0.0350 0.0496 
v> = Aa /Wt) Lra/v+)) 

with c, = 1/e with c, = /2/e 


The bounds p, and ¢, are uniform with respect to £; 
hence, uniform with respect to the grid size h,. The spec- 
tral radii of classical iterative methods depend on h, and 
tend to 1 for h, > 0. Examples are Jacobi’s iteration 


and the Gauss—Seidel iteration with p(M,) = 1 — Olh?) or 
successive overrelaxation (SOR) with p(M,) = 1 — O(h,) 
at best. 

In sharp contrast to this behavior, the two-grid iteration as 
well as the multigrid iteration defined below have spectral 
radii and contraction numbers uniformly bounded by some 
number smaller than 1. As a consequence, an accuracy € 
can be obtained by j = O(log 1/2) iterations, where j is 
independent of hy. 


Remark 1. (a) The rate p, and the contraction number ¢, 
improve with increasing v. However, they do not decrease 
exponentially (*Cn”) but like C/v. This behavior can be 
shown to hold for much more general boundary value 
problems. For less regular problems, C/v has to be changed 
to C/v.,0<a<1. 

(b) A consequence of ¢, % C/v is the recommendation 
not to choose v too large. Doubling of v decreases ¢, 
to ta *¢,/2. But also the computational work is almost 
doubled (at least in the multigrid version). Thus, the choice 
of v is better than 2v if t? S ¢,/2, that is, if ¢, S 1/2. 


3 MULTIGRID METHOD 
3.1 Definition of the multigrid iteration 


In the previous section, the two-grid method proved to be 
a very fast iteration. However, in the more interesting case 
of multidimensional boundary value problems, the two-grid 
iteration is impractical because of the exact solution of 
LiVe; = d,_, required in (22b,). The solution v,_, is 
needed to compute the correction V; = pV,_,. Since V, is 
only an approximation to v; = Ly‘d,, there is no need for 
the exact computation of V, and then of v,_). It suffices to 
calculate an approximate solution ¥,_; of 


Ly, Ye- = de; (26) 


For example, the approximation ¥,_, can be obtained by 
some iterative process: 


R 1 ney -y 
Vp ORV e e e We =: Fea 


(y: number of iterations) 


Note that the coarse grid system (26) is of the same form 
as the original problem L; u, =f, (only £ is replaced by 
£— 1). Hence, the two-grid iteration of Section 2 (with 
levels {£ — 1, £ — 2} instead of {€, £ — 1} can be used as 
an iterative solver for system (26) provided that £ — 1 > 0. 
This combination yields a three-grid method involving the 
levels £, £ — 1, £ — 2. The exact solution of (26) is replaced 


by y steps of the two-grid iteration at the levels £ — 1, 
£—2 involving the solution of new auxiliary equations 
Ly_p Ve- = dez for y different right-hand sides d,_). 

If £ — 2 > 0, the exact solution of L,_ vy. = dez can 
again be approximated by a two-grid iteration at the levels 
£—2 and £ — 3. The resulting algorithm would be a four- 
grid method. This process can be repeated until all £ + 1 
levels £,€—1,2—2,...,1,0 are involved. Equations at 
level 0 must be solved exactly or approximated by some 
other iteration. Level O corresponds to the coarsest grid 
and therefore to the smallest system of equations. In our 
model example (9), there is only one (ny = 2%! — 1 = 1) 
unknown u,(1/2) at level 0. The resulting (£ + 1)-level 
iteration, which is now called a multigrid iteration, can 
easily be described by the following recursive program. 


Multigrid iteration MGM © for solving L, u, = fp 
function MGM (k, u, f); 
begin if k = 0 then MGM := La! +f else (a) 
begin u := S;(u, f); (b) 
d :=r + (L; *«u—f); (c1) 
v:=0; (c2) 
for j := 1 to y 
do v := MGM (k — 1, v, d); (c3) 
MGM :=u—p*v (c4) 
end end; 
(27) 


The meaning of the parameters k, u, f is the same as 
for the procedure TGM from (22). One call of MGM 
performs one iteration of the multigrid method (at level 
k). Comparison of procedure TGM with MGM shows 
that the only modification is the replacement of (22b4) by 
(27cy—c,). Instead of solving the coarse grid equation (26) 
exactly, one applies y iterations of the multigrid iteration 
at level £—1 with right-hand side d. One call of MGM 
at level £ gives rise to y calls of MGM at level £ — 1 
involving a further y? calls at level £ —2, and so on. 
The recursive process ends at level 0, where the auxiliary 
problems LoVo = dy are solved exactly (cf. (27a)). 

We shall see that y = 1 or y = 2 are appropriate values 
of the iteration number y. In the special case of y = 1 (the 
so-called V-cycle), the multigrid algorithm MGM can be 
written equivalently as 


function VCycle(é, uy, fe); 
begin for k := £ step — 1 until 1 do 
begin u, := S (Uz, fz); « 
fp (=r * (Ly * Ug — fo); 
u,_; :=0 
end; 
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Uy = Lp! + fo; 
for k := 1 step1 until ¢ do u, := Ug — p * Ug); 
VCycle = u; 

end; (28) 


The sequence of operations during one step of the multi- 
grid iteration (28) (i.e. for y = 1) is depicted in the left part 
of Figures 5 and 6. The stave symbolizes the scale of lev- - 
els. In the case when y = 2, the multigrid iteration cannot 
be represented in such a simple form as (28). The right part 
of Figure 6 shows the details of one multigrid step at level 
4 involving 2 (4, 8) iteration steps at level 3 (2, 1, resp.). 
Owing to the form of Figure 5, the iteration with y = 1 is 
also called a V-cycle, while the name W-cycle is used for 
y = 2. 

It turns out that the properties of the multigrid itera- 
tion are almost the same as those of the two-grid itera- 
tion. The contraction numbers of both methods are closely 
related. The bounds of the multigrid contraction num- 
bers (and of the convergence rates) are not only inde- 
pendent of the step sizes hy, h,_),..-,/g but also inde- 
pendent of the total number of levels involved in the 
iteration. 


Remark 2. Analogously to the two-grid method 
TGM:™) from (23) involving pre- and postsmoothing, a 
multigrid method MGM”) can be constructed. 


3.2 Example 


A simple test example is the Poisson equation —Au := 
Uy, — Uyy = f € Q= (0, 1) x (0, 1) discretized by the 
five-point difference scheme 


—A,u = h[4u(x, y) — u(x — h, y) -u +h, y) 
—u(xz,y—h)— u(x, y+h)] 


Figure 5. One V-cycle (y = 1) and one W-cycle for £ =2. 
S: smoothing step; R: restriction of the defect; E: exact solving 
at level 0; P: correction ug => Uk — Put. 


Figure 6. One V-cycle (y = 1; left) and one W-cycle (y = 2; 
right) for £ = 4. 
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with Dirichlet values u(x, y) = x? + y?. The coarsest pos- 
sible mesh size is kọ := 1/2, the further mesh sizes are 
hy =27!-*, The table shows the iteration errors e,, := 
luz — us", after m steps of the multigrid iteration 
(u? = 0, W-cycle, 2 Gauss—Seidel pre-smoothing steps) 
at level £ = 7 (corresponding to h; = 1/256 and 65025 
degrees of freedom). 


m Em ratio 
0 1.984 5-0 - 

1 3.038 19-1 0.1531 
2 1.605 19-2 0.0528 
3 9.017 19-4 0.0562 
4 5.219 49-5 0.0579 
5 3.102 45-06 0.0594 
6 1.884 9-07 0.0607 
7 1.166, 9-08 0.0619 
8 7,713,910 0.0662 
9 5.218 9-1 0.0677 


The column ‘ratio’ shows the error reduction corresponding 
to a convergence rate of 0.067. Similar rates are observed 
for other step sizes. The starting value u? = 0 is chosen 
here only for demonstration. Instead it is recommended to 
use the nested iteration from Section 6. 

The results from above correspond to the W-cycle. The 
V-cycle shows a somewhat slower convergence rate of 
about 1.8;9-1 (the error for m =9 is eg = 4.98,9-07). On 
the other hand, the V-cycle requires less computational 
work (see Remark 3). The choice y = 3 is quite impractical. 
Although the computational work increases a lot, y = 3 
yields almost the same results as the W-cycle, for example, 
eg = 4.793101- 


3.3 Computational work 


Let n, = dimU, be the number of degrees of freedom 
at level k. Assuming h,_, % 2h, and Q C Rf, we have 
n, © 24n,_,. Owing to recursivity, one call of MGM (£, -, +) 
involves y’-* calls of MGM (k, -,-). As long as y2~¢ < 1, 
the number of arithmetical operations spent at level k 
decreases exponentially with decreasing level. Indeed, even 
y27? < 1/2 holds because of d > 2 (d = 1 is uninterest- 
ing) and y < 2 (V/W-cycle). Therefore, linear complexity 
holds. 


Remark 3. Let n,* 2¢n,_,. The cost of one call of 
MGM (£,-,-) is bounded by C,n, arithmetical operations, 


where for d = 2 


4 
gist Cpt Co] + 06) 
for y = 1(V-cycle), (29) 


vC; + Cp + Cc] + O27) 
for y = 2(W-cycle) 


a 
A 


where Cony, Cony, Con, and Cong are the costs for 
Uug > Sele, fy), es fe) > (Lg te — fo), Ges vei) > 
Ug ~ pve, and fy +> Lo fo: For d = 3, the factors (4/3) 
and 2 in (29) reduce to (4/3) and (8/7), respectively. 


3.4 Algebraic multigrid methods 


So far we häve assumed that there is hierarchy of discretiza- 
tions (cf. Section 1.1.6). If only the final discretization is 
given or by some other reason the user is not providing 
coarser systems, this construction can be performed as part 
of the ‘algebraic multigrid method’ (AMG). There are dif- 
ferent approaches how to select a coarse grid and how to 
define the prolongations and restrictions. The name ‘alge- 
braic multigrid method’ indicates that only the algebraic 
system of equations is used as input (not necessarily geo- 
metric data and descriptions about the nature of the PDE). 
As a consequence, the resulting algorithm is more ‘black- 
box’-like. 

We refer the interested reader to Stüben (1983), Braess 
(1995), Mandel, Brezina and Vanek (1999), Haase et al. 
(2001), and the literature given therein. 


4 APPLICATION TO FINITE ELEMENT 
EQUATIONS 


While the introductory example corresponds to a difference 
scheme, we now discuss the multigrid method in the case 
of a finite element discretization. The multigrid ingredients 
from Section 1.2 are defined in a canonical way, provided 
that the finite element spaces are nested as explained below. 
Otherwise, hints are given in Section 4.5. 

We assume that the boundary value problem is formu- 
lated in the weak variational form: Find u € H such that 


r 


a(u,v)=f(v) forall v eH (30) 


where the ‘energy space’ H may include the required 
homogeneous Dirichlet conditions (e.g. H = H} (2)). In 
the case of scalar functions, u the bilinear form a(u, v) 


may be 


4 ðu ðv È du 
alu, v) =f CuB a tY Cao 
Lpz GETA Bxg 2 AT 


d 
ðv 
+ a dx 
z a cat) 


The functional f in (30) is f(v) = to fv dx. In the case of 
inhomogeneous Neumann conditions, it contains a further 
term fp vdr (T = 3N denotes the boundary of 9) (see 
Chapter 4, this Volume, Chapter 2, Volume 2). 


4.1 Finite element problem 


The FE space HM is a finite-dimensional subspace of H 
and the FE problems reads: 


Find uF™M e HFEM with au™™, v) = f(v) 
for all v e FEM (31) 


The usual approach in the 2D case, is to define HFM 
by means of piecewise linear (quadratic, ...) elements 
corresponding to a triangulation T; which is a set of triangles 
such that Uert = ©. In 3D, the ‘triangulation’ Tcontains 
tetrahedra and so on. 

The functions u € #FEM are represented by means of the 
nodal basis, that is, there are a set Z of indices a associated 
with ‘nodal points’ x, e IR4 and a basis 


B= {þa E Z} c H 


with the interpolation property $, (Xs) = Sy, for all æ, B € 
T. Setting u, := u(x,), we obtain 


us ye uya for any u e HEEM (32) 
aeT 


The coefficients u, form the coefficient vector u = (uy) yer 
€ U. Relation (32) gives rise to the interpolation u > u := 
Pu e H™™ by means of (32). P is a bijection from U onto 
HEM. 

The stiffness matrix L is given by the entries Lyg = 
a(bg, Pa), While the right-hand side vector is f with f, = 
f (pa). Altogether, we obtain the finite element equation 
Lu=f. 


4.2 Nested finite element spaces 


In order to get a family of finite element problems L,u, = 
f,, we may assume a sequence of nested subspaces,that is, 
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there are finite element spaces MH, (0<k < £) replacing 
HFEM in (31) such that 


Hy CH, C CH CH (33) 


This setting corresponds to conforming finite element 
methods. The nonconforming case will be considered in 
Section 4.6. But even in the conforming case (i.e. Hg C H 
for all levels 2), one may use subspaces with He; É He 
(see Section 4.5). 

The easiest way to construct these nested subspaces is 
by repeated refinement of the triangulation. Let 7; be the 
coarsest triangulation. The refined triangulations Z, must 
satisfy the following: 


each triangle t € J, is the union of triangles ti, t,... 
from y; (k =0,...,€—1) 


Then, elements being piecewise linear (quadratic) on t € 7 
are piecewise linear (quadratic) on t’ € 7,41; hence, u € Hy 
belongs also to H,,;, that is, H, C Hy, aS required in 
(33). 

We introduce the following notation: u,,f, € U, for 
vectors at level k, while L, is the stiffness matrix at level 
k. The nodal basis vectors are y « (a € Zy, k € {0,..., £}), 
where Z, is the index set of level k. The interpolation (32) 
is now denoted by P,:U, > Hp- 


4.3 Canonical prolongations and restrictions for 
FE equations 


The multigrid prolongation p: U4,_, > U, (see (5)) is 
uniquely defined as follows. Let u,_, € U,_, a given coef- 
ficient vector and consider the corresponding finite element 
function ut~! = P,_,u,_, € Hpi. Since Hgy C Hp uk} 
allows a basis representation u*-! = P.u, (cf. (32)) by 
means of a unique coefficient vector u, € H}. Define pu,_; 
by u,. Hence, the formal definition of p is P,' P,_,: 


The canonical restriction is r = p*, where p is the 
canonical prolongation from above, that is, 


for all v,_; € Upi 
(35) 
Here, (-, -), denotes the Euclidean scalar product of U,. 


(rfk Vanden = (bes PV) 
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44 Coarse grid matrix and coarse grid 
correction 


With a view to the next section, we state the characteristic 
relation between p, r, and the stiffness matrices L}, Ly_). 


Remark 4. Let L, and L,_, be the finite element stiffness 
matrices of Section 4.1 and let p and r be canonical. Then 
L,_; coincides with the Galerkin product 


Ly, =r Lp (36) 


The coarse grid correction C,(u,, fp) := w — p Lg! r 
(L, u, — fp) in (21) can be reformulated in terms of the 
finite element function u* = P, u, € H, as follows. 


Proposition 1. Assume H,_, C H, and let p,r be the 
canonical mappings from Section 4.3. Let a(., -) be the bilin- 
ear form of the underlying variational formulation. Then, 
the coarse grid correction U, +> U, — pV;_; from (27c,) is 
equivalent to the mapping 


uk > uk — yet (37a) 
ate |, oly = alk, wh ft) 


for all w! € Hy (7b) 


where vt} e Hp; is the finite element solution of (37b). 
The solvability of (37b) is equivalent to the regularity of 
Ly 


The corresponding representation for nonconforming 
finite elements is given in (43). 


4.5 Nonnested finite element spaces 


So far, we have assumed the finite element spaces to be 
nested: Hy CH, C++: C Hy. If Hy are standard finite ele- 
ment spaces over a triangulation 7,, the inclusion Hg; C 
H, requires that all coarse triangles t € 7,_, must be the 
union of some fine triangles t,,...,t € %,. This condi- 
tion may be violated close to a curved boundary or near 
a curved interface. It may even happen that 1,_, and Hy 
are completely unrelated except that dimH,_, < dim Hy 
expresses that H,_, gives rise to a coarser discretization. 
In these cases, the definition (34) of the canonical prolon- 
gation cannot be applied. 

The cheapest remedy is the definition of p by means 
of the finite element interpolation. Let u,_, € U,_, be the 
coefficient vector corresponding to the nodal basis of Hy» 
that is, (P,_1 Ur-1) (Xu) = Uk- a for x, € Q,_,. Here, Qy 
is the set of nodal points of the triangulation 7,. The desired 


vector p u,_; € U; has to represent the values P, p u,_, on 
Q,. Even when P, p u, cannot coincide with P,_,u,_, 
in H, it can interpolate P,_jw,_: 


(P U1) (Ky) = (Py_ Wy 1) (Ky) for all x, € Q, (38) 


The next remark ensures that definition (38) is a true 
generalization of the canonical choice. 


Remark 5. If H,_; C Hp definition (38) yields the 
canonical prolongation p. 


As soon as p is defined, we can define the restriction by 
(35), that means by r = p*. 


4.6 Nonconforming finite element discretizations 


In the following, we give only the construction of the 
discretization and multigrid ingredients. For more details, 
we refer to Braess, Dryja and Hackbusch (1999). 


4.6.1 Nonconforming discretization 


Let H, C L?(Q) be a family of (nonconforming) finite ele- 
ment spaces, that is, we do not assume H, C H. Moreover, 
the spaces H, are not supposed to be nested H,_, É Hy- 
Instead of the bilinear form a(-, -) a mesh-dependent bilin- 
ear form a,(.,-) on H, x Hp is used. For f € H°, the 
variational problem is discretized by 
uke H, with a, (ut, vt) = fuk) for all v* € Hy 

(39) 
Again the isomorphism between U, and H, is denoted by 
P, (cf. Section 4.3). 


4.6.2 Multi-grid prolongation 


The canonical prolongation p = Py loP given by (34) is 
based on the inclusion H,_, C H,. Since H,_, É Hp, the 
prolongation p must be defined differently. The inclusion 
Hy_1 C H; is to be replaced by a suitable (linear) mapping 


Hr > Hy, (40) 


Once t is constructed, we are able to define the canonical 
prolongation and restriction by 


pi= PriooP,, and r:= p* = Pe oo PA 
(41) 
Although the algorithm needs only the above mapping 
u Hg-1 > Ay, it is easier to define t on a larger space 
=D H,_, + Hg such that t restricted to H; is the identity. 


Since we only require X C L?(Q), no global smoothness 
is necessary. : 

Next, we need an auxiliary space S, connected with X 
and H, via the mappings o and st, as shown in the following 
commutative diagram: 


= Bas S 
inclusion + Ne qa 
—> 
Hpi Hy 
Poa T t P, 
— 
Uki p Uk 
n: S > H, is required to be injective. The desired mapping 
ı (more precisely, its extension to X) is the product 


L= n0: ÈE > H, (42) 


For the simplest nonconforming finite elements, the 
Crouzeix-Raviart element, we specify the spaces and map- 
pings from above in 


Example 1, Let %_, be the coarse triangulation of the 
domain Q, while 7, is obtained by regular halving of 
all triangle sides. H, is the space of all piecewise linear 
functions that are continuous at the midpoints of edges 
in %. Define the nodal point set Q, = {x,:a € Z,} by 
all midpoints of edges in 7, (except boundary points in 
the case of Dirichlet conditions). For all x, € Q,, basis 
functions b¥ € H, are defined by bE (xs) = bap (at, B € Z,). 
Then, U/, is the coefficient space that is mapped by P}: u, = 
Uk aaen > UE = Fach Meadh Onto Hy. Similarly, H,_;, 
U,» and the isomorphism P,_, are defined. 

An appropriate space E, is the space of piecewise linear 
elements with respect to the fine triangulation 7, that may 
be discontinuous. Obviously, 7,_, + Mg C E. 

We set S := Up, = := Pp, and define o as follows: Every 
nodal point x, € &, is the midpoint of the common side of 
adjacent triangles ¢, t’ € J. We define the image ov by 


(ov), = $ [vh Ea) + ly (x,)] for all x, € Q, and v € E 


Here, the linear function v|, is understood to be extended 
to the closure of t. 
The multigrid prolongation is p = 0° P,_,. 


4.6.3 Coarse grid correction 


Given p from (41) and r := p*, the coarse grid correction 
takes the standard form (21). Its FE interpretation is as 
follows: 

Let an FE approximation @* € H, be given. Its defect 
d* € H; is defined by 
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(d*, w*) paca) = ay (i, wt) — f(w*) for all w* € Hy 


Using (39) and the error ef := ñk —u*, one obtains a 
characterization of d* by 


(d*, w) pq = a,(e*, w*) for all w* € Hy 


Then the correction e*! € H,_, is determined as the solu- 
tion of the FE coarse grid equation 


ag (e71, wk) = a, (e*, 1w") for all w € Hga 
(43) 
Here ı is the mapping specified in (40). It is required for 
converting the function w*-' from Hz.. into a function in 
Hp- The correction yields the new approximation us" := 


ük — ek}, 


5 ADDITIVE VARIANT 
5.1 The additive multigrid algorithm 


If one denotes the coarse grid correction (21) by 
C, (a, fp) = uy, — ply, r(L,u, — f), the two-grid iter- 
ation is the product CoS}, that is, TGM(k, Up fp) = 
C, (Skp, £,), fp). Instead of the product, one can form the 
sum of the correction 8, p = u, — Sk (Up: fp) of the smooth- 
ing procedure and the coarse grid correction bpp] += 
pL,2,7(L, u; — fp). Damping the sum of these terms by 
a factor %, one obtains the iteration uf > uf*? := uj — 
BO. + dk x-1). 

In the multigrid case, one can try to separate the compu- 
tations at all levels 0, ..., £. We give a description of this 
algorithm, which looks rather similar to the multiplicative 
algorithm (27) with y = 1. The following additive multigrid 
iteration AMGM™ for solving L, ug = fy uses a damping 
factor 4, which is irrelevant if the iteration is embedded 
into a cg-like acceleration method. 


function AMGM (k, u, f); 
begin if k = 0 then AMGM := Y + Lī' xf else (44a) 


begin d := r (f — L, * u); (44b) 
u := utd * (S} (u, f) — u); (44c) 
v:=0; v:= AMGM (k — 1, v,d)};, (44d) 
AMGM :=u+pv (44e) 
end end; 


Since (44b) does not influence the smoothing part (44c), 
both parts can be performed in parallel. The results of both 
parts are joined in (44e). 

For the standard (multiplicative) two-grid method, we 
know from Section 2.3 that the convergence improves in 
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a particular way when the number v of smoothing steps 
increases. The same holds for general multigrid methods. 
However, this behavior is not true for the additive variant 
as shown by Bastian, Hackbusch and Wittum (1998). 


5.2 Interpretation as subspace iteration 


Similar to algorithm (27), one can resolve the recursivity 
in (44e). For this purpose, we write explicitly p = Pp 4-1 
and r = rg- (aS in (5), (6) and compose these mappings 
to obtain 


Pek = Peje-1°Pe—1,e-2° °° * Peper 


Fpa = Teepe Tere fork <é 


where p,;=1,~¢=/1 in the case of the empty prod- 
uct. From the given right-hand side f; one defines f, := 
ry of, for k = 0,..., £. Then the additive multigrid iteration 
AMGM(u,, fp) described in (44) is given by 


£ 
AMGM (uy, f) =u, +9) però (045) 
k=l 


where the corrections $v, at level k are defined by 


dv, = SU, f) —u, fork =£ 
dv, := S200, fp) fork=1,...,£-1 
öva := Lo! * fo fork =0 


The case k = £, can easily be seen from (44c). Since the 
lower levels use the starting value v = 0, S;(u,,f,) — uy 
simplifies to &(0, f,). 

The interest in subspace iterations has two different rea- 
sons. (i) The computation of the corrections ôv, can be 
performed in parallel. (ii) The interpretation as subspace 
iteration allows quite different convergence proofs (in par- 
ticular for the V-cycle) than the standard multiplicative 
version. Although the resulting statements are weaker, they 
require also weaker assumptions (see Bramble and Zhang, 
1993), 


6 NESTED ITERATION 
6.1 Algorithm 


6.1.1 Starting and terminating an iteration 


The natural approach is to start with a more or less accurate 
initial value u? and to perform several iteration steps: 


ii, := u}; for j:=1 toi do ii, := MGM(, iy, fp) 
_ (46) 
The error of fi, satisfies 


jū- u| < șut — we l (47) 


where ¢ is the contraction number of the iteration (cf. 
Section 2.3) and u, the solution of L; u; = fọ. In particular, 
the simplest choice u? = 0 yields an estimate of the relative 
error 


lä — np) <y Gif u? = 0) (48) 

lel 

In order to obtain a fixed (relative) error £, one needs i > 
log(e)/ log(t) = O(|log(e)|) iterations, where we exploit 
the fact that the contraction number ¢ of the multigrid 
iteration is £-independent. Usually, £ is not explicitly given 
and one has to judge a suitable value. Except in special 
cases, it is useless to take e smaller than the discretization 
error €4;,, (i.e. the difference between u, and the continuous 
solution u). Often, the quantitative size of Easo is not 
known a priori, but only its asymptotic behavior O(hj) (x: 
consistency order). From £ := 4, = O(h7) one concludes 
that i = O(jlog(h,)|) iterations are required to obtain an 
iterate u, with an error of the size of the discretization 
error. The corresponding number of arithmetical operations 
is O(n, |log(h,)|) = Oh logho). 


6.1.2 Basic algorithm 


The nested iteration described below (also called ‘full 
multigrid method’) has several advantages: 


e Although no a priori knowledge of the discretization 
error is needed, the nested iteration produces approx- 
imations fi, with error O(e,,.). 

e The nested iteration is cheaper than the simple appro- 
ach (46). An approximation with error O(e,,,) is 
calculated by O(n,) operations. 

ə Besides i, the solutions ñ, fi,_», ... corresponding 
to the coarser grids are also approximated and are at 
one’s disposal. 


In principle, the nested iteration can be combined with 
any iterative process. The idea is to provide a good starting 
guess w by means of iterating on a coarser grid. 

In this section, the index £ characterizes the level of the 
finest grid, that is, 2 is the maximal level number, while the 
index k € {0,1,..., 2} is used for the intermediate levels. 
We assume that the discrete equations L,u, = f, are given 
for all levels. 


A program-like formulation of the nested iteration reads 
as follows: 


Nested Iteration 


fig = Uy = Lo "fy; (49a) 
for k := 1 to 2 do 
begin fi, := ü; (49b) 


for j := ltoi do ii, := MGM{k, ñp, f) (49c) 
end; 


6.1.3 Implementational details 


Starting Value at Level 0 

The exact solution of Ly Ug = fy is not necessary. One may 
replace (49a) by ŭo * Lo'fp, provided that ||fiy — up| is 
small enough. 


Prolongation p 
The starting value in (49b) is obtained from the coarse 
grid approximation fi,_, by means of some interpolation p. 


- From the programming point of view, the simplest choice 


is p = p from (5). However, interpolations p of higher 
order than p may be taken into considerations, too (see 
Remark 6). 

If an asymptotic expansion is valid, Richardson’s extrap- 
olation can be applied to compute a fairly accurate value fi, 
from i,_, and ii,_, (details in Section 5.4 of Hackbusch, 


1985). 


Iterations per Level 

At each level, i iterations are performed. An appropriate 
choice of i is discussed in Section 6.2. The same value i 
can be chosen for all levels, since the contraction numbers 
of the multigrid iteration are bounded independently of the 
jevel 0 <k < £. Since most of the computational work 
is spent at level £, it can be advantageous to choose 
ip Sig) Sig =e Shy (cf. Remark 8). 


Adaptive Choice of the Finest Level 
The nested iteration can be interpreted in two different 
ways. 

From level £ down to 0. The finest level £ together 
with the stiffness matrix L, and the right-hand side f, is 
given. Then all levels k < £ are only introduced to support 
the multigrid process. The right-hand side f, should be 
computed by fy :=rf,,, fork =€—-1,...,0. 

From level 0 to £. The nested iteration becomes a part 
of the discretization process, if we choose the finer levels 
adaptively. Then the newest informations (a posteriori error 
estimates, comparison of fi, and ñ}, etc.) can be used to 
decide whether a further refinement is needed. 
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In the second case, the loop k = 1,..., £ in (49) should 
be written as while-statement: ‘k := k +1, while the dis- 
cretization is not fine enough’. 


6.2 Analysis of the nested iteration 


The nested iteration (49) requires the specification of an 
iteration number i. The following analysis suggests how to 
choose i. 

Let ¢, be the contraction number of the multigrid iteration 
employed at level k: 


juft! — ul < bellu — ull (50) 


As pointed out before, the numbers ¢, are uniformly 
bounded by some ¢ < 1. Set 


Cr max {t:1<k < £} 1) 


where £ is the maximum level from (49). The discretiza- 
tion error is the (suitably defined) difference between the 
discrete solution u, = L;'f, and the continuous solution 
u. The difference between u, and u,_, = L ae may be 
called the relative discretization error (error of u,_, relative 
to u,,). We suppose an a priori estimate with a consistency 
order x, 

JS u,- —ugll < Cihk friskst 62 
Note that the exponent x in (52) depends on the consistency 
order of the discretization and on the interpolation order of 
p. Therefore, we are led to 


Remark 6. The interpolation order of p should at least 
be equal to the consistency order of the discretization. 


Consider the standard case of a second order discretiza- 
tion (x = 2) of a second order differential equation (2m = 
2). By Note 6, p should be at least piecewise Imear. Hence, 
Ë may be the standard prolongation p. 

To indicate the levels involved, we write P = Dy .-,-1- 
We define the constants 


Coy = max{|| perill: 1 <k < 2} (53a) 
x 
Cy = max { (“=+) isksdl (53b) 
hy 
Cy := CC; (53c) 


One can show Cy) =1 for the most frequent choices 
of pg. Moreover, C}; = 2* holds for the usual sequence 
hy = ho/2*. Hence, the value of C, is explicitly known: 
Cy = 2%. 
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Theorem 2 (Error analysis) Assume (52) and Ct! < 
1 with C, from (53a-c), ¢ from (51), and i from (49). 
Set C3 (i) := ¢'/(1 — C,¢'). Then the nested iteration (49) 
with i steps of the multigrid iteration per level results in ti, 
(0 < k < £) satisfying the error estimate 


lõ, — ull < C36, i) C hz 
for allk = 0,...,€ with u, = L;'f, (54) 


Theorem 2 ensures that the errors at all levels k = 
0,1,..., differ from the (bound of the) relative dis- 
cretization error C,h{ only by a factor C(t, i), which is 
explicitly known. The only condition on i is C,¢! < 1. 
The nested iteration is as cheap as possible if i = 1 sat- 
isfies C,t' < 1. With C, = 2* from above, this condition 
becomes 2*¢ < 1. Assuming in addition the standard case 
of x = 2, we obtain 


Remark 7. Assume C, = 4 and ¢< 1/4. Then Theo- 
rem 2 holds for i = 1. Note that the rates observed in 
Section 3.2 are far below the critical bound 1/4. 


Remark 8 (computational work) Let W, be work for 
one multigrid iteration at level k. Assuming h,_, © 2h, 
and Q C R, the dimensions n, of the system at level 
k should satisfy n, œ% 2¢n,_,. Since the work for p is 
less dominant, the cost of the nested iteration (49) is 
about i x W,/(1 — 274). In the 2D case, the value becomes 
(4/3)iW,. Obviously, the complete nested iteration is only 
insignificantly more expensive than the work spent on level 
£ alone. 


6.3 Numerical example 


We apply the nested iteration (49) to the five-point scheme 
discretization of 


~Au = f:= —A(exp(x + y?)) in Q = (0, 1) x (0,1) 


u = ọ:= exp(x + y?) onl = 8Q 
(55) 
The multigrid iteration MGM in (49) uses v = y = 2. 
The results of the nested iteration are shown, where pis 
the cubic interpolation in the interior and the quadratic 
interpolation near the boundary. Let uf**' be the exact 
solution exp(x + y°) restricted to the grid, while u; is the 
exact discrete solution. We have to distinguish between 
the iteration errors |Ù} —u,||, the total error jū}, — ugt] 
with į from (49c), and the discretization error fu, — ugt. 
The table shows the maximum norm of the total errors 
(lai, — agt] for i = 1 and i = 2. 


£ h i=l i=2 i=% 

O 1/2 79944658,- 7.9944658,9-2 7.9944658,9-2 
1 1⁄4 3,9908756,9-2 2.9215605j9-2 2.8969488,5-2 
2 US 157887212 8.1023136j9-3 8.0307789 0-5 
3 1/16 3291934613 2.0768391j9-1 2.0729855,9-s 
4 1/32 5.7591549,9-4 5.2253758;9-4 5.247399 9-4 
5 1/64 1.3291689j-4 1.3093946j9-4 1.3093956,9-4 


The last column (üZ = u,!) contains the discretization 
error that should be in balance with the iteration error. 
Obviously, the choice i = 1 is sufficient. i = 2 needs the 
double work and cannot improve the total error substan- 
tially. 


6.4 Use of acceleration by conjugate gradient 
methods 


Given an iteration, it is often recommended to improve the 
convergence speed by the method of conjugate gradients 
(in the positive definite case) or by variants that apply to 
more general cases (see Hackbusch, 1994, Section 9). Here, 
two remarks are of interest. 

If the matrix L, in Lyu, = f, is symmetric and positive 
definite, one should use a symmetric multigrid variant, 
that is, MGM”) from Remark 2 with v, = v, and a 
symmetric smoothing iteration (it is even sufficient that pre- 
smoothing is adjoint to postsmoothing). Then the standard 
conjugate gradient (cg) method can be applied. However, 
the use of cg is recommended only if the rate of the 
multigrid convergence is not sufficiently fast; otherwise, 
the overhead for the cg-method does not pay. 


7 NONLINEAR EQUATIONS 


In the following, we consider the nonlinear elliptic prob- 
lem £(u) = 0, where £ is a nonlinear operator (e.g. L(u) = 
div p(u) grad u — f or L(u) = Au + uu, — f). In the fol- 
lowing, we assume that £(u) = 0 is discretized with respect 
to a hierarchy of grids: 
Lu) = f for k =0,...,£ (56) 
Even if one is only interested in the solution of L) = 
0, the multigrid approach in Section 7.2 leads to problems 
(56) with small right-hand sides f,. The smallness of fy 


ensures that (56) has a unique local solution (we do not 
require global uniqueness of the solutions). 


7.1 Newton’s method and linear multigrid 


Newton’s method requires the derivative (Jacobi matrix) of 
Lg, which we, denote by L,(u,) = (8/du,)£,(u,). Then the 
Newton iteration ut = ut — [L 0] (£, (uz) — f,) 
requires the solution of the linear system L,(u,)3, = dy 
for the defect d; := L,(u,) — fẹ. The latter task can be 
performed by the multigrid iteration from above. 


7.2 Nonlinear multigrid iteration 


The approach from Section 7.1 requires the computation of 
the Jacobi matrix L,(u,). This can be avoided by applying 
the nonlinear multigrid iteration NMGM from (58) below. 
Since NMGM uses approximations a, of £,(u,) = 0 for 
k <2 —1, we start with the nonlinear nested iteration, 
which produces ii, as well as their defects f, := £, (ù). 


solve Ly (tig) = fy approximately; 
7 (e.g. by Newton’s method) 
for k:=1 to £ do 


begin f,_, := Lp- (ñ); (defect of ŭ,_1) 
fi, := pti; (start at level k as in (49b)) 

for i:=1 to i do & := NMGM{(k, ii,,f,) 
(as in (49c)) 


(57) 

Now we define the iteration NMGM, which uses 0,_,, 

Ë; as reference point (since in the linear case fy =0 

is the reference point, we do not see &,_, in the linear 
multigrid method). 


end; 


function NMGM (k, u, f); 
begin if k=0 then NMGM := iy else 
(dig approximation to £Ly(tip) = f) 
begin for i:=1 to v do u:= S,(u, f); 
(pre-smoothing) 
d := r(£ 0) — $); (restriction of defect) 
£ := e(d); (small positive factor) 
8 := f,_, — £ * d; (right-hand side at level k — 1) 
v i= th (starting value for correction) 
for i := 1 to y do v := NMGM({k — 1, v, 8); 
NMGM: =u + p(y — t_,)/e 
(coarse grid correction) 
end end; 
(58) 
Here, S,(u,,f,) is a nonlinear smoothing iteration for 
L,(u,) = fp. For instance, the analogue of the Richardson 
iteration (4) is S,(u,, fp) = uy — 0, (L,(u,) — fa), where 
op ~ 1/ |L,(u,)||-, The factor e(d) may, for example, be 
chosen as o/ ||d|| with a small number o. The smallness of 
e guarantees that £,_;(¥,1) =f, — € * d has a unique 
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local solution close to M,_,. Note that f,_;, Ùp; are pro- 
duced by (57). 

Note that (58) is a true generalization of the linear 
iteration MGM: If NMGM is applied to a linear problem 
(ie. £,(u,) := Lu, — fp), it produces the same iterates as 
MGM independently of the choice of the reference values 


T L,(u;) is Lipschitz continuous and if further technical 
conditions are fulfilled, one can show that the asymptotic 
convergence rate of NMGM coincides with the rate of the 
lincar iteration MGM applied to the linearized problem with 
the matrices L; := L,(ut***). Also, other specifications 
of &i,_1,£,_1,€ are possible (see Section 9 of Hackbusch, 
1985). 

If, by some reason, the nested iteration is not used, 
the value fi,_, can be replaced by ru,, while fp; = 
Ly-1(tig_,) (FAS: ‘full approximation storage method’ 
from Brandt (1977)). 


8 EIGENVALUE PROBLEMS 


The (continuous) eigenvalue problem reads Lu = du, 
where u satisfies homogeneous boundary values. The 
hierarchy of discrete eigenvalue problems is 


L,u, = du, fork =0,...,2 


(possibly with I replaced by the mass matrix \M,,). Again, 
the two-grid method consists of smoothing and coarse grid 
correction based on the defect d, = L,u, — \u,. However, 
the computation of a correction ŝu, from (Ly — Dôu, = 
d, is problematic, since L,— AI becomes singular for 
the eigenvalue 4. Nevertheless, (L, — dADdu, = d; is solv- 
able, since the right-hand side d, belongs to the image 
space. Furthermore, the nonuniqueness of ŝu, is harmless, 
since the kernel lies in the eigenspace. These statements 
are only approximately true for the restricted coarse grid 
equation (L,_, — Dv,_, = d,_,- Therefore, certain pro- 
jections are necessary. The complete multigrid algorithms 
can be found in Section 12 of Hackbusch (1985). 

If L, is not symmetric, the right- and left-eigenvectors 
can be computed simultaneously. It is advantageous to com- 
pute a group of eigenvectors by combining the multigrid 
approach with the Ritz method (see Chapter 19, this Vol- 
ume). 


9 APPLICATIONS TO THE BOUNDARY 
ELEMENT METHOD (BEM) 


There are two quite different groups of boundary element 
method (BEM) problems that can be solved by means of 
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multigrid methods. Integral equations of the second kind 
are treated in Section 9.1, while integral equations with 
hypersingular kernel are mentioned in Section 9.2. In both 
cases, the arising matrices K, are fully populated. Except 
the coarsest grid, the only operation needed is the matrix- 
vector multiplication u, +> K,u,. Its naive implementa- 
tion requires O(n") arithmetical operations. Therefore, one 
should use the fast multiplication described in the author’s 
contribution in Chapter 21, this Volume in this encyclo- 
pedia (see Chapter 12, this Volume). 


9.1 Application to integral equations of the 
second kind 


Fredholm integral equations of second kind have the form 
hu = Ku + f (X40, f given) with 


(Ku)(x) := [isc y)u(y)dy forxe D 


where the kernel s of the integral operator K is given. The 
Picard iteration u 1> (1/4)(Ku — f) converges only if 
|X| > p(K), but in many important applications the Picard 
iteration has a smoothing effect: nonsmooth functions e are 
mapped into a smooth function (1/)Ke by only one step. 
This enables the following multigrid iteration of the second 
kind, which makes use of a hierarchy \u, = K,u, + £, of 
discrete equations. 


function MGM (k, u, f); 
MGM solving \u, = K,u, + fi, 
begin if k=0 then MGM := (\MI—K))~'f else 
begin u:= i(k, xu +f); 
d := r0% — Ku — f,); 
(restriction of defect) 


(Picard iteration) 


v:=0; (start value for correction) 
for i := 1 to 2 do v := MGM (k — 1, v, d); 
(W-cycle) 


MGM i=l py (coarse grid correction) 


end end; 
(59) 

Because of the strong smoothing effect, MGM has con- 
vergence rates O(ht) with a > 0. Hence, with increasing 
dimension (decreasing step size h,) the iteration becomes 
faster! The value « depends on the smoothness of the image 
of K, on the order of the interpolation p, and so on. 

Note that we need only the multiplication by the matrices 
K,. It is not required that K is an integral operator with 
explicitly known kernel s. Iteration (59) applied to a fixed 
point equation ku = Ku + f has the same properties, pro- 
vided that K shows the smoothing effect. For further details 


we refer to Section 12 in Hackbusch (1985) and Section 5 
in Hackbusch (1995). : 

The nonlinear fixed point equation Au = K(u) can be 
solved by an analogous version of the nonlincar multigrid 
method from Section 7. 


9.2 Application to integral equations with 
hypersingular kernel 


Boundary value problems Lu =0 with inhomogeneous 
Neumann conditions can be formulated by means of hyper- 
singular integral operators. For L = A and du/dn = p on 
T = 92 c R?, the solution u is given by the double-layer 
potential u(x) = fp f(y)(8/dn,)s(x,y) dr, (x eT) with 
s(x, y) = 1/[4xilx — y||], provided that f satisfies 


[socom y)dr, =o(x)  forxer. 
r ðn, ðn, 7 


Since 3?s/ðn,ðn, has a nonintegrable singularity, the inte- 
gral must be understood in the sense of Hadamard. The vari- 
ational formulation uses the energy space H = HY2(T). 
Using piecewise linear elements on I’, we arrive at the 
setting (31) with the symmetric bilinear form 


alu, v) = 5 ff two — uly)] 
0 p 
x [v(x) — v(y)] in, an, yd, dr, 


(cf. Section 8.3 of Hackbusch, 1995). Since a(., +) is ellip- 
tic with respect to the energy space specified above, the 
discrete problems A,f, = þ, can be solved in the same 
way as FEM systems from Section 4. In particular, p 
and r should be the canonical transfer mappings from 
Section 4.3. The only difference is that A, is fully pop- 
ulated, whereas FEM matrices are sparse. This is the rea- 
son, why the fast panels clustering techniques should be 
implemented. ý 


9.3 Application to first kind integral equations 
with weakly singular kernel 


In the case of the single layer potential equation Ku = f 
with Ku = fp s(x, yu(y) dI, (an example of an integral 
equation of the first kind with weakly singular kernel) 
the standard smoothing procedure is not applicable. The 
reason is that the integral operator K has the negative 
order —1 (while the hypersingular integral operator from 
Section 9.2 has the positive order +1). As a consequence, 
the low eigenvalues of K are associated with the oscillatory 


eigenfunctions, while the high eigenvalues belong to 
the smooth eigenfunctions. Therefore, a Richardson-like 
iteration reducing the high frequencies is not a smoothing 
procedure. 

As a remedy (cf. Bramble, Leyk and Pasciak, 1993), 
the smoothing iteration must use a preconditioning of the 
equation by an operator D of order >1: DKu = Df or 
KDv= f or D,K Dyw = Df. 
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1 INTRODUCTION 


The main background of the so-called ‘panel clustering 
technique’ is the efficient numerical treatment of inte- 
gral equations. Therefore, we first refer the reader to the 
boundary element method (BEM) and the respective inte- 
gral equations (see Section 1.2). The discrete problem is 
described by a fully populated n x n matrix. The naive 
approach requires a storage of the size n? and the stan- 
dard matrix—vector multiplication needs O(n?) arithmetical 
operations. In order to realize the advantages of BEM 
compared with the finite element method (FEM), it is 
essential to reduce the order O(n”) of the cost to almost 
O(n). 

The panel clustering technique described in Section 2 
allows reduction in the storage and matrix—vector costs 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


from O(n) to O(n log’ n). The reduction in the memory 
is, in particular, important for 3D applications when 
O(n log? n) data can easily be stored, while O(n”) exceeds 
the memory bounds. The reduction in the cost for the 
matrix—vector multiplication is important as well, since 
this is the essential operation in usual iterative methods 
for solving the system of linear equations. The essential 
ingredients of the panel clustering technique are (i) the far- 
field expansion (Section 2.1) and Gi) the panel cluster tree 
(Section 2.2). The chapter is concluded with some hints 
concerning implementational details (Section 2.7). 

Section 3 presents a second variant of the panel clus- 
tering technique. This variant of the panel clustering tech- 
nique can be generalized to the technique of hierarchical 
matrices (H-matrices), which is described in Section 4. 
Again, the 7-matrix structure can be used to represent 
fully populated matrices, This technique allows not only the 
matrix—vector multiplication but also matrix operations like 
matrix-plus-matrix, matrix-times-matrix, and even matrix- 
inversion. 


1.1 Notations 


We have already used the Landau symbol O(f(n)), which 
means that the quantity is bounded by C * f(n) as n > 00 
for some positive constant C. For an index set I, the set 
R’ denotes the set of (real) vectors a = (4;),-; indexed by 
means of 7. Similarly, the notation R!*/ is used for the set 
of matrices A = (a; )ierjer 
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nna TNO 


Item Explanation References 
A,B,... Matrices of size n x n (11) 
b Block, vertex of T) Section 3.1.1 
b, BEM basis function (10) 
d Spatial dimension of R? (i) 
diam, dist Diameter and distance of clusters (18), (38) 
I Index set for the matrix entries Section 4.1 
i Be Index set in the representation (15) 
of k 
Ji, Re; Far-field coefficients Section 2.4.3 
K Integral operator (4) 
n Problem dimension, matrix size (10), Section 4.1 
P Set of paneis (boundary elements) Section 1.2.3 
s(x, y) Fundamental solution Section 1.2.1 
t Triangle (panel) t € P Section 1.2.3 
S, Sy, Sr, Srp (T) Set of sons Section 2.2, Section 3.1.1, Section 4.2 
T, T3, Ti, Tixi Cluster tree, tree of blocks, block Section 2.2, Section 3.1.1, Section 4.2 
cluster tree 
u Coefficient vector from R” (10) 
Va Boundary element space Section 1.2.3 
x,y,z Points in R? (2) 
Z Center of cluster t Section 2.7.2 
r Surface contained in R Section 1.2.2 
n Parameter in admissibility (18), (38) 
condition 
K(x, y) Kernel of integral operator (4) 
K(x, y), p(X, Y) Far-field approximation of K (15), (40) 
5f Collocation point Section 1.2.3 
t (also t’, 0, 0’) Cluster, vertex of the tree T Section 2.2 
©, BL Expansion functions (15) 
ree: OF; Surface integration (4) 
#S Cardinality of the set S, that is, 


number of elements 


1.2 The boundary element method (BEM) 


1.2.1 The problem to be solved 


There are several important applications where an elliptic 
boundary value problem with a vanishing source term is to 
be solved, 


Lu=0 inQcR? (1) 


Here & may be a bounded or an unbounded domain. Since 
L is assumed to have constant coefficients, the fundamental 
solution s(x, y) is known explicitly. It satisfies L s(x, y) = 
8(x — y), where L, = L is applied to the x-argument and ò 


is the Dirac function. In the case of Lu = f Æ 0, a further 
integral over Q appears, which can be treated efficiently 
by means of the hierarchical matrices from Section 4. 
Examples of L and s are the Laplace problem, 


= log — y| for d = 2 
z A 
(ie. x,y € R?) 
Pie RE : ford =3 
4n|x — yl 5 
(ie. x,y € R’) 
(2) 
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the Helmholtz problem L=A+a’,  s(x,y)= 
exp(ia|x — y|)/[4|x — yl], and the Lamé equation (d = 3) 
pAu+ (i+ w)Vdivu = 0 
K+ 3p 
8x (A + 2p) 


EN ee Ay" 
x i I+ A+ & i 
Ix-y| +34  |x-yl 


SG, y) = 


(3) 


In the latter example, the fundamental solution S(x, y) 
is matrix-valued. In all examples, |x — y| is the standard 
Euclidean norm of the vector x — y € R?. 


1.2.2 Formulation by an integral equation 


The advantage of the following integral equation formula- 
tion is the fact that the domain of integration is the boundary 
T = 0. Thus, the spatial dimension is reduced by one. 
This advantage is even more essential if Q is an unbounded 
exterior domain. 

There are several integral formulations based on integral 
operators K of the form 


KA = [em DFO, a 


where K(x, y) is s(x, y) or some derivative with respect to 
x or y. We give two examples. 


Single-layer potential for a Dirichlet problem 

Let the integral operator be defined by k = s with s from 
(2), that is, (Kf) (x) = (1/41) fe(f (y)/Ix — yl) dP, in the 
3D case. Then (x) := (Kf) (x) is defined for all x e R? 
and satisfies Ad = 0 in R“\I. In order to enforce the 
Dirichlet value, 


=g onr (5) 
the function f has to satisfy the integral equation 


Kf=g foalxer 


; fY) 
that is, dr, =4ng(x) forallxer (6 
pea 8 (x) (6) 


Therefore, one has to solve (a discrete version of) Kf = g. 
For the resulting solution f, the potential 6 = Kf fulfils 
(1) as well as (5) and can be evaluated at any point of 
interest. 


Direct method 

In (6), one has to solve for the unknown function f, which 
(indirectly) yields the solution of the Laplace problem after 
evaluation of 6 = K f. A direct approach is 


1 
sued =2(0) | «cs, yuqnar, 


with «= Z, g(x) := f s yoy) ar, O) 
ny r 

which yields the Dirichlet boundary values u(x), x E€ I, 
of the interior domain with Neumann data >. s(x, y) is 
the fundamental solution from (2). k(x, y) is called the 
double-layer kernel. The equation on the left in (7) holds 
for almost ali x€ F but must be corrected by a factor 
corresponding to the spherical angle of an edge or corner 
of the surface (cf. Hackbusch, 1997). This is important for 
the discretization by collocation but does not matter in the 
case of the Galerkin discretization. 


1.2.3 Discretization by BEM 


In the following, we assume the more interesting case of 
d = 3, that is, T is a two-dimensional surface. 


Triangulation of the surface 
To begin with, assume that the surface can be represented 
by a union of planar triangles: = J,-pt, where the tri- 
angulation P is the set of these (closed) triangles. Usually, 
the triangulation is required to be conforming in the sense 
that the intersection of two different triangles is allowed to 
be either empty, a node,or an edge. Each t € P can be pro- 
duced by an affine map n, from the unit triangle tni (ver- 
tices at (0, 0), (0, 1), (1, 0)) onto z, that is, 1, (qe) = t. In 
the following, we shall assume this simple case (of course, 
quadrilaterals instead of triangles are possible as well). 

Alternatively, the true surface can be approximated by 
curved triangles, that is, is replaced by U,ep My (unit) 
where n, is a more involved map producing a curved 
triangle. 

In the BEM context, the triangles are often called panels. 

Since the panels are assumed to be closed, two different 
panels may overlap at their boundaries. We say that the two 
subsets s’, s” C T are weakly disjoint, if area(s’ N s”) = 0. 
This covers the case of (completely) disjoint sets as well as 
the case when the boundaries overlap (but not the interior 
parts). 


Boundary element space 

The simplest boundary element is the piecewise constant 
one, that is, the boundary element space V, that consists of 
functions being piecewise constant on each triangle t € P. 
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In the case of (continuous and) piecewise linear elements, 
the functions from V, are (continuous and) piecewise affine 
on each triangle ¢ € P. In the case of curved triangles, 
the piecewise affine functions on fp are mapped by n, 
onto n; (fnit). Furthermore, one can consider spaces V, of 
continuous or discontinuous functions that coincide with 
higher-order polynomials on ¢ € P. 


Galerkin discretization 
The Galerkin discretization of ku + Ku = with respect 
to the boundary element space V, and K from (4) reads 


Find u, € V, such that 
J u, (x)u(x) dr’, +f f K(x, yu, (y)v() dr, dI, 
r rJr 
= 1 o(x)u(x) dl, forall v € V, (8) 
r 


Collocation discretization 

Since (8) involves a double integration, often the colloca- 
tion is preferred, although the numerical statements about 
collocation are weaker. For this purpose, one defines a 
set E = {ț} : i = 1,... , n} of collocation points Ẹ', where 
n = dim V,. For instance, in the case of piecewise constant 
elements, £ € & should be chosen as the centroid of each 
t € P. Then, the collocation discretization of ħu + Ku=o 
reads 


Find u, € V, such that 
dy) + [eG Yann dl, = 6G) for all g e BO) 


Matrix formulation 

Let B= {b,,...,b,} be a basis of V,. For instance, for 
piecewise constant elements, b, is 1 on the ith triangle and 
0 on each other t € P. In this case, we may use t eP 
as an index instead of i= 1,...,n, that is, the basis is 
B= {b,:t € P}. 

In the case of discontinuous and piecewise linear ele- 
ments, we have three basis functions per triangle: B = 
{b,,.:¢ € P, k = 1, 2,3}, while for continuous and piece- 
wise linear elements, each basis function b; is associated 
with a vertex x, of the triangulation. 

Each u, € V, is represented by 


n 
up = > ujb; (10) 
j=l 


where u = (u,),_, _,, abbreviates the coefficient vector. 
Then the solution of the collocation problem (9) is 
characterized by 


Au + Bu =f ay 


where the matrices A and B and the vector f are given by 
pJ on 
a= (Ya 


Jalpa 


B= ( Í TOTO M 


f= (ED). a2) 
The Galerkin solution is given by (10) and (11) with 
pina E l 
A= b.(x)b, (x) dr, 13 
(newman) P ga 


j=1,..n 


B= (f f K(x, ph gL dr, ar,) 
rer i=l 


f= (J. p(x)b; (x) ar.) 


In the case of (12), A =I holds, provided that b, is 
the Lagrange function. In any case, A is a sparse matrix, 
which causes no problems. Differently, B is usually a fully 
populated matrix. Standard representation needs a storage 
of n? for all entries. The panel clustering method will reduce 
this size to O(n log? n), that is, the storage will be almost 
linear in the dimension n. The same improvement holds for 
the cost of the matrix—vector multiplication. 

For further details about integral equations and bound- 
ary elements, see Hackbusch (1997) (see Chapter 12, this 
Volume). 


2 THE PANEL CLUSTERING METHOD 
(FIRST VERSION) 


The panel clustering method was introduced in the mid- 
eighties (cf. Hackbusch and Nowak, 1986). The multipole 
method, which started at the same time (cf. Greengard and 
Rokhlin, 1997), is similar, with the difference that it is 
designed more for point charges and requires an operator- 
dependent construction for the expansion functions. Quite 
another, but theoretically related, approach is the matrix 
compression, which can be applied in the case of a proper 
wavelet discretization (cf. Dahmen, Préssdorf and Schnei- 
der, 1993). 

The first version, which we present now, corresponds 
to the collocation equation (11) and more precisely to the 
performance of the matrix—vector multiplication by B, that 
is, u > Bu. We recall that the ith component of Bu reads 


Bu, =u fee waar, a4) 
jaye 
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(see (12)). For the fast evaluation of this term, we have 
to introduce the far-field expansion (Section 2.1) and the 
panel cluster tree (Section 2.2). 


2.1 Far-field expansion 


Consider one of the collocation points x = §! and approx- 
imate the kernel x(x, y) in a subset t C I'\{x} by a finite 
expansion of the form 


R(x, y) = DTS) (1s) 


1€lm 


where ®,(y) for i € I, are functions that are independent 
of x. J, is an index set whose size is indicated by m (see 
(16) below). The upper index t denotes a subset of T in 
which &(x, -) should be a good approximation of x(x, +). 
In the simplest case, &(x,-) is the Taylor expansion 
around the center of t up to degree m — 1. In this case, the 
index set I,, equals the set {t EN@ryteetiysm—- 
of multi-indices, where d is the spatial dimension, In 
order to make the functions ,(y) independent of t, we 
may choose the monomials ®,(y) = y' = yj x+: x yg 
(expansion around zero). In the case discussed above, the 
number of indices of I,, is bounded by (16) with C; = 1: 


#1, <C mt (16) 


Further details about the approximation of k by (15) will 
follow in Section 2.7.2. 


2.2 Cluster tree 


Assembling several neighboring triangles (panels) t € P, 
we can form clusters. The union of neighboring clusters 
yield even larger clusters. This process can be continued 
until the complete surface I" is obtained as the largest 
cluster. As in Figure 1, the clusters may have an irregular 
geometric shape (they may even be unconnected). For Jater 
purpose, it is favorable if clusters are rather compact, that 
is, area(c)/diam(c)* should be large. 

One may consider this process also from the opposite 
point of view (domain decomposition). The surface I is 
divided into smaller parts (clusters) that are divided further 
until only the panels remain as the trivial clusters. For a 
construction based on this approach, see Section 2.7.1. 

The process of clustering (or repeated domain decompo- 
sition) is represented by the cluster tree T. In the following 
definition, P is the set of panels, and we denote the set 
of unions of panels by S= {U,cp-t : P’ C P}. All panels 
t € P belong to S but also T = U, epi € S. 


a: i 


Figure 1. Clustering of four triangles. 


Definition 1. (a) All vertices of T belong to S. (b) T € T is 
the root of T. (c) The leaves of T are the panels from P. (d) 
Ift € T is no leaf, there is a set S(x) with at least two sons, 
which are weakly disjoint (cf. Section 1.2.3). Furthermore, 
S(t) satisfies 


t=Uv an 


ver 


Usually, S(t) consists of exactly two sons so that T 
becomes a binary tree. 


Remark 1. The number of clusters t€ T is at most 
#T <P —1=2n—1. The upper bound #7 = 2n—1 
holds for a binary tree. 


2.3 Admissible clusters and admissible coverings 


For the integration of «(x, y)u(y) over a cluster t (cf. (14)), 
we shall use the expansion k(x, y) from (15) instead of «. 
This requires the following condition on x and T. 

We call t € T to be an admissible cluster with respect to 
some control point x € R? if 


diam(t) < n dist(x, t) (18) 


The parameter n > 0 will be chosen later (m will turn out 
to be constant, independent of the panel size h). From 
inequality (18), we see that the larger the distance between 
x and the cluster, the larger the cluster may be. 

A set of clusters C = {t,,...,1,} C T is called a cover- 
ing (of T) if the clusters are weakly disjoint and satisfy 


re Uy (19) 


There are two trivial coverings; C = {T} is the coarsest 
one. The finest is C = P. In the first case, the cluster is as 
large as possible (but the number of clusters is minimum); 
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in the second case, the clusters are as small as possible (but 
their number is at a maximum). 

In the following, we are looking for a covering that (from 
the computational point of view) should consist of a small 
number of clusters and that, on the other hand, should be 
admissible. This leads us to the following definition. 


Definition 2. We call C = {t,,-..,%,} C T an admissible 
covering (of T) with respect to x if it is a covering satisfying 
(19) and 


either x; € P or q; is admissible with respect tox (20) 


Remark 2. (a) If x €T, there is no covering (19) of T 
consisting of only admissible clusters. (b) Condition (20) 
states that inadmissible clusters are panels. 


The number of clusters in C should be as small as 
possible. The optimum is discussed in 


Proposition 1. For each x € R, there is a unique admis- 
sible covering C(x) with respect to x with minimum num- 
ber no(x) := #C(x) of clusters. C(x) is called the minimum 
admissible covering with respect to x. 


The minimum admissible covering C(x) with respect to 
x can be easily computed by 


C := Ø; Divide(T, C), comment the result is C = C(x) 
(21a) 


where Divide is the recursive procedure 


procedure Divide(t, C); comment t € T is a cluster, 
C is a subset of T; 
begin if t is admissible with respect to x then 
C= CU ft 
else ift €? then C := CU {1} 
else for all t’ € S(t) do Divide(t’, C) 
end; 
(21b) 


2.4 Algorithm for the matrix-vector 
multiplication 


2.4.1 Partition into near and far field 


Instead of computing the matrix entries, we compute the 
far-field coefficients in Phase I. In Phase II, we evaluate 
an approximation of Ku, at x e R4 (e.g. at x=! € B). 
Repeating (14), we recall that the desired result of the 


matrix—vector multiplication v = Bu is 
i Pi 
v=) u; [ KE, DAT, 22) 
jel 


Let C(e‘) be the minimum admissible covering deter- 
mined in (21b). We split C(&') into a near-field and a 
far-field part defined by 


Cear (E) == {t € C(E') : t is not admissible} 
Cre (6!) := {1 € CGF) : t is admissible} 


All T € Cyoa,(8') are panels (cf. Remark 2b). The integral 
in (22) can be written as i 


[>= y f+ S pa. 


TECnear(b!) eCrar(') T 


This induces an analogous splitting of v; from (22) into 


v = up + vf 
The part v will be approximated by of" in Section 2.4.3. 
Note that the splitting depends on i. Another i’ yields 
another collocation point &” and another splitting into 
Crear") and Corl”). 


2.4.2 Near-field part 


The near-field part of v; is computed exactly (or with 
sufficiently accurate quadrature): à 


t€Coear(t!) J i 


Since the support of the basis functions b, is small, there 
are only a constant number of indices j such that a panel 
t € Cyooe(E') intersects with supp(b,). Hence, the sum >", 
has only O(1) terms. The number of panels in Creal) 
turns out to be bounded by O(logz). 


2.4.3 Far-field part 


Replacing the exact kernel « in the definition of oe by 
(x, y) from (15), we obtain 


i= Sy, Ly fieno, 


t 


Telt) J 
= E Ly [Leeremmer, 
TECilk) j Lely 
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Summation and integration can be interchanged 


f= D Lee Dy femyma, oe 


TEChar(E') tEn J 


The following integrals are called far-field coefficients: 


Fo) = [MMA WE ly TET, LS jem 


(25) 
Of particular interest are the far-field coefficients corre- 
sponding to panels. These are the only ones to be evaluated 
in the first phase: 


J; @;) = [ono ar, Vel, £€P, i<j <n) 
; (26) 
Remark 3. (a) The coefficients J; (b) are independent 
of the special vector u in the matrix—vector multiplication 
v = Bu. (b) There are only a fixed number of panels ¢ 
intersecting with the support of b; otherwise, J; (b,) = 0. 
The number of nonzero coefficients J; (b;) is O(n #I,,). 


As soon as the far-field coefficients (26) are computed, 
the quantities 


J=) a SG) ely teP) 27 
i 


can be summed up by O(n#l,,) additions. For t € T\P, 
we exploit the tree structure: 


=J 5 


vesi) 


for t € T\P (28) 


The coefficients J} represent the sum 
L= Dale) = fougna, 
j $ J 


Hence, the quantities 3" can be computed from the simple 


sum 
f= Yo yes (29) 
Telital) VEL, 


(see (24)). Since the number of clusters t € C,,,(E') is 
expected to be much smaller than the number n of all 
panels, representation (29) should be advantageous. 


2.5 The additional quadrature error 


The replacement of vf by df can be regarded as an 
additional quadrature error. The error of the expansion (15) 


depends on the order m and the cluster containing y. The 
exact requirements on « are as follows. 


Assumption 1. Let ng € (0,1) and a ball BC R? be 
given. There are constants C} and C, such that for all 
0 <1 < no < L and m € N, there are expansions K of the 
form (15) satisfying 


ka, y) — K(x, y)] < Cy(Cyn)™ lk (x, y)| 
for all y € B and diam B < ndist(x, B) (30) 


Inequality (30) provides an estimation of the relative 
error of K. The proof of (30) in Hackbusch and Nowak 
(1989) uses a Taylor expansion K with respect to y for 
standard examples and determines the values of C,, Ca, 
and no. The estimation in (30) by k(x, y) on the right-hand 
side makes, for instance, sense for positive kernels like 
k(x, y) = (1/40 |x — yi). In Section 2.7.3, it will become 
obvious that (30) can also be obtained for the double-layer 
kernel (see (7)), although its sign changes. 

By (30), the error is of order O(n”). In order to make the 
error equal to the consistency error O(h”) (h: panel size), 
we choose 7 in (18) by 


n= nin) = On ™ E-D) <n, x<m 1) 


Then O(h”) equals O(h*). Since, in (33), m will be chosen 
as O(log n), the quantity 1 becomes independent of k. 


2.6 Complexity of the algorithm 


2.6.1 Choice of parameters 


The complexity of the algorithm depends mainly on the 
number 7¢(x) of clusters in the minimum admissible cover- 
ing C(x). Under natural conditions described in Hackbusch 
and Nowak (1989), where also details of the proofs can be 
found, there is a constant Co with 


1 d-} 
noO Snem) = Ceh Z) 8e +n P 
for all x € RÊ (32) 


with ņ from (18). The logarithmic factor can even be 
omitted if x T. 

Inserting (31) into (32) and using #P = O(n), we obtain 
the following estimate for the number n¢(x) of clusters in 
C(x): 


ne) < nen, n) = Con”! log(2 + Cyn'*/™) 
for all x e R? 


604 Panel Clustering Techniques and Hierarchical Matrices for BEM and FEM 


The optimal choice of the expansion order m turns out to 
be 


m := | n| (lxJ := largest integer i with i a 
(33) 


Then n*/™ (and n) is a constant, and we obtain the estimate 
n(x) < Clogn (34) 


Therefore, we have to deal with only O(logn) clusters 
instead of n panels. 


2.6.2 Operation count 


While Phase I has to be performed only once for initializa- 
tion, Phase II has to be repeated for every matrix-vector 
multiplication. 


Phase I (a) Algorithm (21a, b) (computing the minimum 
admissible covering C(x)) requires O(n;(x)) operations per 
point x. Because of (34), and since there are n different 
collocation points x = &', the total amount of work in this 
part is O(n logn). 

(b) The computation of vP** in (23) needs O(log n) 
evaluations of integrals of the form f, «(&', y)b,(y) dr’, per 
index i, that is, in total O(n logn) evaluations. 

(c) The far-field coefficients J? (b;) (t EP, tE lp) can 
be computed by O(n log? n) Operations (cf. (16), (33)) 
and require O(n log? n) evaluations (or approximations) of 
integrals of the form f, ®,(y)b,(y) dT. 

(d) The number of coefficients Kr! ) to be evaluated for 
TE Cali), LE Im 1 <i <n equals O(n log?*! n). 


Phase H (a) The far-field coefficients J} for the nontrivial 
clusters t € T\P and all indices ı € J, can be summed up 
in O(n#!,,) = O(n log’ n) additions (see (28). 

(b) The final summation in (29) requires only 
O(n log**! n) additions. 


Theorem 1. (a) The data computed in Phase I and 
the quantities from Phase II require a storage of size 
O(nlog**! n) data. (b) Each matrix-vector multiplication 
u > Bu can be approximated up to an error of size O(h”) 
by O(n logt! n) operations. 


Concerning the storage in Phase I, we remark 
that only the coverings C(&'), the nonzero integrals 
fue ,y)b,(y) dl, from (23), the expansion coefficients 
Kī’), and the far-field coefficients J! (b;) are to be stored. 

The costs for Phase I can be further reduced if several 
panels are geometrically similar. 


2.7 Some implementational details 


2.7.1 Construction of the cluster tree 


In the following, we describe the construction of the cluster 
tree T by means of bounding boxes. This method is, in 
particular, suited for elements in a domain Q C R. The 
application to surfaces will be discussed at the end of this 
section. 

Associate every element ¢ with its centroid denoted by 
z, with the coordinates z,, (k = 1,...,d). Let Z be the 
set of these points, The smallest bounding box Q C RÊ 
containing all z, is given by 


Q = [a], b] x --- x lag, bal, 
where a, := min{z; : z € Z}, b, := max{z,:z¢€ Z} 
(35) 


Choose k € {1,..., d} such that the side length b, — a, is 
maximum and divide Q into 


Q! = [abi] x-++ x [an ay + 23%) x x [aa ba] 


Q” = [a,b] x x [a+ N x x [a ba] 


This gives rise to a partition of Z into Z’ := Z N Q! and 
Z" := Z N Q7. The procedure can be repeated recursively: 
Determine the bounding box Q’ of Z’ (it may be smaller 
than Q/!) and split Q’ into Q’! and Q’"! and accordingly 
Z’. The recursion stops when the resulting subset of Z con- 
tains only one point. Obviously, the construction produces 
a binary tree Tz starting with the root Z. Any vertex of 
Tz is a subset Z* of Z. Since each z € Z* corresponds to 
exactly one panel t = 4, € P, the union ),,. t, describes 
a cluster. In this way, the tree T, can be transferred into 
the desired cluster tree. 

Figure 2 shows the bisection process in the two- 
dimensional case. 

For BEM, there is a modification that is of interest. As 
soon as the corresponding cluster is close to some (tangent) 
hyperplane, the coordinates of the bounding box can be 
rotated so that d — 1 coordinates are in the hyperplane, 
while the dth coordinate is in the normal direction. 


2.7.2 Far-field expansion by polynomial 
interpolation 


In (15), K(x, y} = yeh KI (x)®, (y) describes the approxi- 
mation of k(x, y) in the cluster t for a fixed collocation point 
x. Let d,, = #I,, denote the dimension of polynomials of 
total degree m — 1. Choose d,, interpolation points ¢; and 
let K(x, y) be the polynomial interpolating k(x, y). It has the 
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Figure 2. The bounding box to the left containing the points z; is divided into two parts in zı-direction. In the second step, the new 


bounding boxes are divided in zp-direction. 


representation Sn K(x, C?)L}(y), where LẸ (y) denotes the 
Lagrange polynomials with the property L; (tj) = 8, (Kro- 
necker symbol), Expanding L; into monomials, one obtains 
the desired representation (15). 

Concerning the choice of interpolation points, one is not 
restricted to the cluster t CTC R7?. Instead, one can first 
define suitable quadrature points ¢; in the unit cube C = 
{-1/2,1 /234 . Given a cluster t with center z,, consider 
the cube C, := Z, + diam(t)C and use the corresponding 
quadrature points tf =z, + diam(x)t;. The interpolation 
polynomial converges to the Taylor polynomial if all inter- 
polation points ¢f tend to the center Z,. 

The latter approach requires that k(x, -) is defined on C,. 
This holds, for example, for kernels from Section 1.2.1 but 
not for kernels involving normal derivatives as the double- 
layer kernel, since the normal derivative is defined on T 
only. The remedy is given in the next subsection. 


2.7.3 Far-field expansion for kernels with normal 
derivatives 


The double-layer kemel for the Laplace problem 
is (8/8n,)(1/4z |x — yl) = (1/47) (x ~ y, n(y))/1x = y® 
(cf. (7)). One possibility is to approximate 1/ ix- yl 
by an expression of the form $, k{(x)®,(y). Then 
(1/4n)((x — y. nly) /1X — y|*) is approximated by para 
L kem ynan- E awl an) 
and the latter expression is again of the form (15). Note 
that nonsmooth surfaces yielding nonsmooth normal direc- 
tions n(y) cause no difficulty. Furthermore, the relative 
error estimate (30) can be shown (the error becomes zero 
if (1/4m)((x — y, n(y))/Ix — yl?) = 0 due to x — yl n(y)). 
The disadvantage of the described approach is the fact 
that the number of terms is multiplied by the factor 4. 
This can be avoided by approximating (1/47 |x — yl) by 
Sy, «,(x)@* (y) and forming its normal derivative: PDAS) 
(8/an,)®*(y), which gives (15) with ,(y) := (8/8ny) 
ory), 


be differentiated. 


2.8 Modification: approximations with basis 
transforms 


In kE Yy) = Dre, WO (Y) (cf (15)), we required 
®©, (y) to be independent of t. This fact was used in 
(28): the quadrature results of J, (b;) = Ju 2 (adr, 
for the sons t € S(t) could be used to get PACA) = 
J. ©,(y)b,(y)dI’, as their sum. 

However, a global basis {®,(y) : t € Iņ} has numerical 
disadvantages. Considering, for example, polynomials, one 
likes to have locally defined bases {P$ (y) : | € J,,} for each 
cluster t € T. Since these different bases span the same 
spaces, there are transformations of the form 


fort eT and v € S(t) 


(36) 


Oy) = J ot Oh (y) 
hELn 


We redefine 
Ib) = f DPT, 687 


using the t-dependent basis {P} (y) : t € Im}. The com- 
putation starts at the leaves (panels): J;(b;) is com- 
puted for all t € P, Owing to (36), we have Jz@;) = 
Eresi) Zremm or’, Jb) instead of (25). We store only 
Jp (b;) fort €P and compute for a given vector u the quan- 
tities J} = Ly u; Ji (b;) as in (27). However, the formula 
for Jt = 30, 4, Ji (b;) has now to use (37) and reads 


cae Dep Se hee! es 


VES(t) Kelm 
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instead of (28). These J} can now be used in (29) to obtain 
afer, 

; Concerning the coefficients ork, we return to the basis 
of Lagrange functions L} introduced in Section 2.7.2. In 
that case, œ% = LY (tt) involves nothing more than the 
evaluation of the +t’-basis functions at the interpolation 
points associated to t. 

Another obvious basis are the monomials (y — z,)° cen- 
tered around the midpoint z, of the cluster t. In this case, 
(36) describes the re-expansion of polynomials centered at 


Z„ around the new center z,. 


3 THE PANEL CLUSTERING METHOD 
(SECOND VERSION) 


The previous version of the panel clustering method is 
completely row-oriented. For each row index i, we compute 
the component v; of v = Bu by means of a covering C$) 
that may change with i. As a consequence, the kernel 
K(x, y) = K($, y) is a function of y only and (15) describes 
an expansion with respect to y. 

In the following, we try to determine a version in which 
the x- and y-directions are treated equally. This is, in par- 
ticular, more appropriate for the Galerkin discretization (8). 

The tree T from Section 2.2 was introduced to describe 
decompositions of I’. Now we consider the product T x 
T and determine a corresponding (second) tree T}, in 
Section 3.1.1. The vertices of T, are products txoC 
T xT of clusters t,o € T. The kernel «(x,y) will be 
approximated by a special separable expansion (40) for 
(x,y)Etxo. 


3.1 The tree Tz of products of clusters 


3.1.1 Definition 


Let the cluster tree T be defined as in Section 2.2. The 
second tree T, is constructed from T as follows. We use 
the symbol b for the vertices of T}. While S(t) denotes the 
set of sons of t € T, the sons of b € T, form the set $,(b). 


Definition 3. (a) T, is a subset of T x T, that is, each 
vertex of T, is a product t X o of two clusters t,0 € T. (b) 
T xT € T, is the root of the tree. (c) It remains to construct 
the set S (b) of sons of anyb=txo€ Ty: 


5,(b) 
{r xo: 0’ e Slo), if neither o nor t 
v € S(t)} are leaves of T, 
= { {txo : 0 €S()} if tis a leaf of T, but not o 
{Y xo:t' €S(t)} ifo is a leaf of T, but not t 
ø if t and o are leaves of Ty 


The last case, $, (b) = Ø is equivalent to saying that b is 
a leaf in T,. Note that (b) defines the first vertex in T}, while 
by (c), one recursively gets new vertices belonging to 7}. 
In this way, T, is completely defined by T. In particular, 
only the tree structure of T has to be stored. 


Remark 4. The tree T, has the same properties as T in 
(17): For any b € T, not being a leaf, the sons b’ € S(b) 
are weakly disjoint and b = Jy<s,4) b’. The leaves of T, 
are of the form t x o with panels t, o € P. 


3.1.2 Admissibility, covering C2 


Let 1 > 0 be the same parameter as in (18). A product 
b=txo€ T, is called admissible if 


max{diam(t), diam(o)} < 9 dist(t, o) (38) 


As in Definition 2, we define a covering of T x T. 


Definition 4. (a) A covering C, C T, is a subset with 
pairwise weakly disjoint b € C, such that pec, b =T xT. 
(b) An admissible covering C, is a covering such that all 
b € C, are either admissible or are leaves. 


Again, we are looking for a minimum admissible cov- 
ering C,, which is obtained by (39a) using Divide2 from 
(39b), 


C, := 6; Divide2T x T, C3); (39a) 
procedure Divide2(b, C,); comment b € T}, C, C Ty; 
begin if b is admissible then C, := C, U {b} 
else if b is a leaf of T, then C, :=C, U {b} 
else for all b’ € S (b) do Divide2(b’, C,) 
end; (39b) 


In the following, C, denotes the minimum admissible cov- 
ering obtained by (39a-b). 


3.2 Kernel expansion 


We split C, into a far field CÈ" := {b € C, : b is admissible} 
and near field C3™ := {b € C, : b is not admissible}. In 
the latter case, b is a leaf of T). Owing to the- admissi- 
bility condition (38), the kernel function k(x, y) allows an 
expansion with respect to x € t and y € o when b € Ci". 
For this purpose, we introduce a basis {Ọ} : v € 7„} for 
each cluster t€ T, which is applied with respect to x 
and y. 
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Given b = qt x o € CH, we approximate k(x, y) by an 
expression K(x, y) of the form 


eX) = D> 2 EOD) 


VElm Lely 


for (x,y) € b = t x o € CÈ (40) 


An example of such an expression is the Taylor expan- 
sion with respect to (x, y) around the centers (z,,2,) of t 
and o. Then ®}(x) is the monomial (x — z,)”, where v € In 
belongs to the same set of multi-indices as for the Taylor 
expansion in Section 2.1. 

The coefficients k? in (40) form a dp x d,, matrix 
K? = (iÈ avner, Where dp := #1,,. 


3.3 Matrix-vector multiplication for the 
Galerkin discretization 


We recall the Galerkin discretization (8) and the matrix 
formulation (13) involving the matrix B. The ith component 
of v = Bu is 


= r , Y)b;(y)b. (x) dr, dr 
v Xu ff AOA 


The covering C, allows to replace the integration over 
T xT by the sum of integrals over b € C, 


u= E Xu f [xc nens,0ar, ar, 


b=txoel j=1 


Tf b = t x o € C3™, the expression remains unchanged and 
yields the near-field part vP*, For b= 1x o € CB", we 
replace k(x, y) by «,(x, y) from (40), 


F= E allp Leow 


b=txoece j=1 VEIin PElm 


=p (x,y) 
x b, Mb) dr, dF, 


= D Dw A en (f ow war) 


b=txoec” j=l Velm Elm 


x ( f DLL) ar,) 


= È LaL VK Rove 


b=rxoech j=1  VElnm Elm 


= 3 EJ e ese 


b=txoEC VElm WEL 


with quantities J}(b;) and J} already defined in Section 
2.8. 


4 HIERARCHICAL MATRICES 


The panel clustering method was element oriented and 
enabled a fast matrix-vector multiplication. The present 
method is index oriented and supports all matrix operations, 
that is, additionally an approximate addition, multiplication, 
and inversion of matrices are possible. 

The technique of hierarchical matrices (H-matrices) 
applies not only to full BEM matrices, but also to the fully 
populated inverse stiffness matrices of FEM problems. 

Again, the construction is based on trees that are similar 
to those from Section 3. However, the panels are replaced 
by the indices, that is, by the degrees of freedom. This will 
lead to a block-structured matrix, where all subblocks are 
filled with low-rank matrices. 

The use of low-rank matrices for subblocks was already 
proposed by Tyrtyshnikov (1996); however, the construc- 
tion of the efficient block-structure was missing. 


4.1 Index set I 


We consider square matrices A = (q,;); jer, where the 
indices i, j run through the index set J of size n := #1. 
We shall not use an explicit naming of the indices by 
I ={1,...,n}, since this might lead to the wrong impres- 
sion that the indices must have a special ordering. The 
technique of }1-matrices can easily be extended to rectangu- 
lar matrices B = (b; jer, jez, where I and J are different 
index sets. 

For the following construction, we need to know some 
geometric information about the indices. The simplest case 
is given by point data: 


Assumption 2. Each index i € 7 is associated with a 
‘nodal point’ &' € R°. 


In this case, we use the following obvious definitions for 
the diameter of a subset J’ C I and for the distance of two 
subsets I’, I” C I: 


diam(7’) := max {|§' — 57| :i, j €/'} 
dist(Z’, 7”) := min {| — §/|:i € F’, j € I”} (41a) 


where |-| denotes the Euclidean norm in Rf. 
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Although this information is sufficient for the practical 
application, precise statements (and proofs) about the FEM 
(or BEM) Galerkin method require the following support 
information: 


Assumption 3. Each index i € J is associated with the 
support X, := supp(b;) C R? of the finite element basis 
function b;. For subsets I’, I” C I we define 


x’) := Ux; 


iel 
diam(J’) := max {|x — yl : x,y € XU)} 
dist(7’, 7”) := min {|x — yl : x € XU’), y € XU} 
(41b) 


4.2 Cluster tree 7; for H-matrices 


The following tree T, is constructed as the panel cluster 
tree T from Definition 1, but the panels ¢ € P are replaced 
by indices i € J. 


Definition 5. The cluster tree T, consisting of subsets of 
I is structured as follows. 


(a) 1 €T, is the root of T, 

(b) The leaves of T, are given by the one-element sets {i} 
foralli el. 

(c) Ift eT, is no leaf, there exist disjoint sons t),..., ty 
€ T, (k =k(t) > 1) with t=1,U---UX. 


We denote the set of sons by S,(t). Usually, binary trees 
(k = 2) are appropriate. 


Remark 5. (a) The fact that the leaves of T, contain 
exactly one element is assumed in order to simplify the 
considerations. In practice, one fixes a number Chat (€.g. 
Ciear = 32) and deletes all t € S;(t) with #t < Cy, that 
is, in the reduced tree, the leaves are characterized by 
#t < Clear- Definition 5 corresponds to Ceat = 1- 

(b) The construction from Section 2.7.1 can be used 
as well to build 7,. The centers z, from Section 2.7.1 
(cf. Figure 2) are to be replaced by the points f from 
Assumption 2. 


4.3 Block cluster tree Ty xz 
The entries a; of a matrix A €R! are indexed by 
pairs (i, 7)¢ Zx I. Accordingly, the block cluster tree 
T;,,; contains subsets of I x J. Given the tree T,, the 
block cluster tree T;,, is constructed similar to T, in 
Section 3.1.1. The vertices (blocks) of T;,., are denoted 


by b; the sons of b form the set S,,.,(b). For a matrix A € 
R’*! and a block b € T;,,, the corresponding submatrix is 
denoted by 


Al, = (aiae (42) 


Definition 6. (a) The vertices of T;„; are products b = 
x © of two clusters t,0 € T}. 

(b) I x I €T,,,, is the root of the tree. 

(c) The set of sons of b = T x 0 € Trx; is defined by 


Six (b) := {T x o: U € S0), o € S(o)} 


Note that S; xz (b) = @ if either S(t) or S(o) is the empty 
set. Hence, the leaves of S; x; (b) are those b = t x o where 
either t or o are leaves of T;. 


4.4 Admissibility condition 


Next, we need an admissibility condition that allows us to 
check if a block b is of appropriate size. We recall that 
diam(o) and dist(t, o) (t,o € T;) are defined by (41a or 
b). 


Definition 7. Let n > 0 be a fixed parameter. The block 
b=txoeT),, is called admissible, if either b is a leaf 
or 


min{diam(1), diam(o)} < n dist(t, o) (43) 


Note that in (43), the minimum of the diameters appears, 
while the panel clustering in (38) needs the maximum. 

The simplest way to check admissibility condition 
(43) is to apply (43) to the bounding boxes Q, and 
Q, from (35). The condition min{diam(Q,), diam(Q,)} < 
2ndist(Q,, Q), which is easy to verify, implies (43). 


4.5 Admissible block partitioning 


The first step in the construction of H-matrices is the block 
partitioning (see, e.g. Figure 3). The partitioning called P 
is a covering in the sense that all blocks b € P are disjoint 
and Usep b =I x I. The partitioning is admissible if all 
b€ P are admissible in the sense of Definition 7. Again, 
we are looking for a minimum admissible partitioning for 
which #P is as small as possible. It can be determined as in 
(21a,b) or (39a,b). We apply (44) with DivideP from (44), 


P := Ø; DivideP (I x I, PY, (44a) 
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procedure DivideP (b, P); comment b € T,,.;, P CTixr 
begin if b is admissible then P := PU{b} 
else. if b is a leaf of T,,,, then P := P U {b} 
else for all b' € S,,;(b) do DivideP(b', P) 
end; (44b) 


Next, we give an example of such an minimum admis- 
sible partitioning that corresponds to a discretization of the 
integral operator $ log |x — y| f(y) dy, where d = 1 is the 
spatial dimension. Consider the piecewise constant bound- 
ary elements that give rise to the supports 


X,:=(G@—-Dh,ih] fori el :={1,...;n} 


and h := i where n = 2? 
n 


(cf. (41)). The cluster tree T, is the binary tree obtained 
by a uniform halving: the resulting clusters form the tree 
T, = {tt :0<£<p,1<i <2}, where 


ta (G —1) x2? 4-1, (i 1) e242, DP} 

(45) 

Note that z? = J is the root, while q? = {i} are the leaves. 

Further, we choose 7 = 1 in (43). Then the resulting block 

partitioning is shown in Figure 3. Under natural conditions, 
the number of blocks is #P = O(n). 


Figure 3. Partitioning for a 1D example. 


4.6 H-matrices and Rk-matrices 


Definition 8 (H-matrix) Given a cluster tree T, for an 
index set I, let the related minimum admissible partitioning 
be denoted by P. Further, let k e N be a given integer. Then 
the set H(k, P) consists of all matrices M € R! x! with 


rank(M|,)<k  forallbe P 


Any rectangular matrix CeR’ (6=tx0) with 
rank(C) <k gives rise to the following equivalent 
representations 


k 
C=} a,b] €R’ with vectors a, € R, b; € R", 


i=l 


C=AB' with A e R™, Be R°% 


(46) 


where the matrices A = [a,,...,a,], B = [b,,..., b;] are 
composed by the vectors a;,b;. The vectors in (46) may 
be linearly dependent, since rank(M) < k is not excluded. 
Throughout this section, the bound k on the rank is assumed 
to be much smaller than the dimension n. 


Definition 9 (Rk-matrix) Matrices represented in the 
form (46) are called Rk-matrices. 


Remark 6. (a) Rk-matrices require a storage of size 
2k (#0 + #1). 

(b) Multiplication of an Rk-matrix C= AB" with 
a vector requires k scalar products and vector addi- 
tions: Cv = 574, 0,a, with a, := (bj, v). The cost is 
2k (Ht + #0). 

(c) Multiplication of two Rk-matrices R= AB" e R’, 
S=CD" e R” with b=1tx 0, b =0 x0’ leads to the 
Rk-matrix T = ED" e R™ with E = A * Z, where Z = 
B'C is of size k x k. The operation cost is 2k? (#t + #0). 

(d) The product MR for an arbitrary matrix M € R” and 
an Rk-matrix R e R® is again an Rk-matrix of the form 
(46) with a; := Ma,. 


According to Remark 5, the leaves of T; may be char- 
acterized by #1 < Cjeqr. Then all submatrices M|, of an 
H-matrix M are represented as Rk-matrices except for the 
case when b = t xo with #0, #t < Cheap, where a (usual) 
full matrix is preferable. 


Remark 7. Under rather general assumptions on the tree 
T; and on the geometric data §' or X; (cf. Section 4.1), the 
storage requirements for any M € H(k, P) are O(nk log(n)) 
(cf. Hackbusch, 1999; Hackbusch and Khoromskij, 2000). 
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The constant in the estimate O(nk log(n)) from Remark 7 
is determined by a sparsity constant C, of the partitioning 
P (see Grasedyck and Hackbusch, 2003). 


4.7 Hierarchical format for BEM and FEM 
matrices 


So far, the set H(k, P) of matrices is defined. It is still to be 
shown that matrices of this format are able to approximate 
well with those matrices that we want to represent. 


4.7.1 BEM matrices 


The second version of the panel clustering method is 
already quite close to the present form. In the former 
case, the data associated with a block b= t xo € T, 
(notation in the sense of Section 3.1.1) describe the part 
Sf, «(x, y)b;(x)u(y) dxdy with k approximated by k. In 
the special case u =b) (ie. u is the jth unit vector), 
this integral becomes f, f, «(x, y)b; (x)b; (y) dx dy. Now, we 
want to represent a,, = fp fp K(x, y)b; Wb; (y) dx dy which, 
in general, is different from the previous "result, since the 
supports X,, X; of b;, b; are not necessarily contained in t 
and o. Owing io the admissibility of the block b = t x o € 
P (notation in the sense of Section 4.2), x(x, y) is approx- 
imated by 


k 
(x,y) = D> wS) 


tl 


for all x € X(t), y € X (0) 


(47) 
(cf. (15)). Inserting one term of (47) into fp fpe bi 
by) dxdy for (i, j)¢€b, we obtain a; * Bj, where 
a; = fp Yb, (x) dx and $; = Je 7b, (y) dy. These 
components form the vectors a, = (a;);e, and b, = (Bj) jeo 
Hence, 


k 
f [rE naasa 
isl 


shows that the approximation of « in X(t) x X (0) by « 
containing k terms is equivalent to having an Rk-submatrix 
Al, with rank (Al,) < k. In the case of (admissible) blocks 
b € P, which do not satisfy (43), b = {(i, j)} is of size 
1 x 1 (cf. Definition 7), so that a; can be defined by the 
exact entry. 


for (i, j) €b 


Remark 8. (a) Let A € R’*! be the exact BEM matrix. 
The existence of a (well approximating) H-matrix Ae 
H(k, P) follows from a sufficiently accurate expansion (47) 
with k terms for all ‘far-field blocks’ b e P satisfying (43). 


(b) The BEM kernels (mathematically more precisely: 
asymptotically smooth kernels; cf. Hackbusch and Khorom- 
skij, 2000) allow an approximation up to an error of 
O(nv(4-")) by k terms (n is the factor in (43)). 


Conceming the construction of Ae H(k, P), one can 
follow the pattern of panel clustering (see, e.g. Börm and 
Hackbusch, 2002 and Bérm, Grasedyck and Hackbusch, 
2003). Interestingly, there is another approach called 
adaptive cross approximation (ACA) by Bebendorf (2000); 
Bebendorf and Rjasanov (2003), which only makes use of 
the procedure (i, j) > fr fe, Yb, Ob, (y) dxdy (this 
mapping is evaluated only for a few index pairs (i, j) € 
Ixi). 


4.7.2 FEM matrices 


Since FEM matrices are sparse, we have the following 
trivial statement. 


Remark 9. Let (41) be used to define the admissible 
partitioning P. Then, for any k > 1, a FEM stiffness matrix 
belongs to H(k, P). 


The reason is that A], = O for all blocks b satisfying 
(43), since (43) implies that the supports of the basis 
functions b;, b; (i, j) € b) are disjoint. Remark 9 expresses 
the fact that A can be considered as H-matrix. Therefore, 
we can immediately apply the matrix operations described 
below. In particular, the inverse matrix can be determined 
approximately. The latter task requires that A! has a good 
approximation B € H(k, P). This property is the subject of 
the next theorem. 


Theorem 2. Let Lu = — F% p=1 3p (9M) be a uni- 
formly elliptic differential operator whose coefficients are 
allowed to be extremely nonsmooth: c;; € L®(Q). Let A 
be a FEM stiffness matrix for this boundary value prob- 
lem. Then there are approximants B, € H(k, P) so that B, 
converges exponentially to Aq} (details in Bebendorf and 
Hackbusch, 2003) (see Chapter 4, this Volume). 


4.8 Matrix operations 


In the following, we describe the matrix operations 
that can be performed using H-matrices. Except for 
the matrix-vector multiplication, the operations are 
approximate ones, but the accuracy can be controlled 
by means of the rank parameter k. For further details 
and cost estimates, we refer to Grasedyck and Hackbusch 
(2003). 
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4.8.1 Matrix—vector multiplication 


The matrix-vector product y > y + Mx is performed by 
the call MVM (M, I x I, x, y) of 


procedure MVM (M, b, x, y); 
comment b= 1 x 0 € Typ M€ R™! x,y €R; 
begin if S,,z(b) Æ Ø then for b € S,,.,(b) 
do MVM (M, b', x, y) 
else y|, = yl, + Ml; z|o 
end; (48) 


The third line of (48) uses the matrix—vector multiplication 
of an Rk-matrix with a vector (see Remark 6b). The overall 
arithmetical cost is O(nk log n). 


4.8.2 Matrix—matrix addition, truncation 


For M’, M” € Hik, P), the exact sum M := M’ +M” is 
obtained by summing M’|, + M”|, over all blocks b € P. 
The problem is, however, that usually M’|, + M”|, has 
rank 2k, so that a fill-in occurs and M is no longer in the 
set H(k, P). Therefore, a truncation of M|, = M’|, +M”j, 
back to an Rk-matrix M|, is applied. 

Concerning the truncation, we recall the optimal approx- 
imation of a general (rectangular) matrix M € R™° by 
an Rk-matrix M. Optimality holds with respect to the 
spectral norm (Ajj = max{lAx] / |x| :x # 0} = Vm) 
where Amaz is the maximum eigenvalue of AAT) and the 
Frobenius norm (Allp = (£; j a). 


Algorithm 1. (a) Calculate the singular-value decompo- 
sition M = UEV™ of M, that is, U, V are unitary matrices, 
while © = diag(o,,...) is a diagonal rectangular matrix 
containing the singular values 0, = 02 Z ++». 

(b) Set U := [U,,..., Up] (first k eines of U), 
diag(o,,...,0,) (first (largest) k singular values), 
[Vi --- Vg] (first k columns of V). 

(c) Set A:=US e R™ and B:= Ve R°% in (46). 
Then M = AB" is the best Rk-matrix approximation of M. 


<i 


We call M a truncation of M to the set of Rk-matrices. 
The costs are in general O((#t + #0)°) operations. In our 
application, the sum M := M’ + M” has rank K < 2k. 
Here, we can apply a cheaper singular-value decomposition. 


Algorithm 2. Let M= AB" be an RK-matrix with A, 
Be R™* and K >k. 

(a) Calculate a truncated QR-decomposition A = QR; 
of A, that is, Q, ER™*, QIQ, =5 and R, € R*** 
upper triangular matrix. 


(b) Calculate a truncated QR-decomposition B = Q,Rz 
of B, Qg E R°**, Ry E RE*K, 

(c) Calculate a singular-value decomposition ULV" of 
the K x K matrix Ry R}. 

(d) Set G, È, V as in 1 Algorithm Ib. 

(e) Set A:=Q,UE ER™ and B:=Q Ve R™., 
Then, M = ABT is the best Rk-matrix approximation of M. 


The truncation from above costs O(K?(#t + #0) + K 3) 
arithmetical operations. 

The exact addition M’, M” € H(k, P) => M := M + 
M” € H(2k, P) together with the truncation M € H(2k, P) 
> M e H(k, P) is denoted by the formatted addition 


M’@M" (49) 


Similarly, the formatted subtraction © is defined. The 
complexity of ® and © is O(nk" log n). 


4.8.3 Matrix—matrix multiplication 


Let X,Y € H(k, P). Under the assumption that T, is 
a binary tree, both matrices are substructured by X = 


Ki al i Y, ] ; 
,¥=] 7! [2], and the product is 
ue Xn Yn sc le 


Ya 
=| Xu%ut Xk XuYn + Xz 
XY 
XaYn + Xo¥o Xa Vi + Xv¥n 


The most costly subproducts are X,,¥,, and X)¥29, since 
these submatrices have the finest partitioning, whereas 
X19, Yn X21, Yz have a coarser format. Performing the 
products recursively and adding according to Section 4.8.2, 
we obtain an approximate multiplication X © Y. Its costs 
are O(nk? log? n) (cf. Hackbusch, 1999), A detailed algo- 
rithm can be found in Grasedyck and Hackbusch (2003) 
and Bérm, Grasedyck and Hackbusch (2003). 


4.8.4 Inversion 


Let A € H(k, P). Under the assumption that T, is a binary 


A ; 
tree, we have as above that A = | $ 1 Ap . The inverse 


21 422 

of a 2 x 2 block matrix can be computed by the block- 
Gauss elimination (see Hackbusch, 1999), if the principal 
submatrices are invertible: 


Ao = eae hace "Ay ATT 


ij Ay Ans” | 
SAAT S- 


with S= Ay -AAD An (50) 


Applying a recursive procedure Inv, compute Inv(A;,) 
as an approximation of Ajj, invert §:= Ay © A, © 
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In (A11) © Ay, and perform the remaining operations in 
(50) by means of @ and ©. Again, the precise algorithm is 
in Bérm, Grasedyck and Hackbusch (2003). 

The complexity of the computation of the formatted 
inverse is O(nk? log? n) (cf. Hackbusch, 1999; Grasedyck, 
2001; Grasedyck and Hackbusch, 2003). 


4.9 Examples 


4.9.1 BEM case 


To demonstrate the advantage of the H-matrix approach, 
we consider the simple example of the discretization of the 
single-layer potential on the unit circle using a Galerkin 
method with piecewise constant basis functions. The loga- 
rithmic kernel function is approximated by the interpolatory 
approach from Section 2.7.2 (interpolation at Chebyshev 
points). 

The first column of Table 1 contains the number of 
degrees of freedom (n = #J), the following columns 
give the relative error ||A — All/ |All (spectral norm). We 
observe that the error is bounded independently of the 
discretization level and that it decreases very quickly when 
the interpolation order is increased. 

The time (SUN Enterprise 6000 using 248 MHz Ultra- 
SPARC II) required for matrix—vector multiplications is 
given in Table 2. We can see that the complexity grows 
almost linearly in the number of degrees of freedom and 
rather slowly with respect to the interpolation order. 

Finally, we consider the time required for building the 
H-matrix representation of the discretized integral opera- 
tor (see Table 3). The integral of the Lagrange polynomials 
is computed by using an exact Gauss quadrature formula, 
while the integral of the kernel function is computed ana- 
lytically. Once more we observe an almost linear growth 
of the complexity with respect to the number of degrees 
of freedom and a slow growth with respect to the inter- 
polation order. Note that even on the specified rather slow 
processor, the boundary element matrix for more than half 
a million degrees of freedom can be approximated with an 
error < 0.03% in less than half an hour. 


Table 1. Approximation error for the single-layer potential. 


n 1 2 3 4 5 


1024 3.5719-2 2.16i0—3 2.5010—4 7.8819-6 2.67-6 
2048 3.581-2 2.1910-3 2.5lio—-4 7.8610-6 2.699-6 
4096 3.59i9-2 2.2010-3 25li-4 7.8710-6 2.681-6 
8192 3.5910-2 2.20i0-3 2.5210—-4 7.7619—6 2.6710-6 
16384 3.59102 2.2lio-3 2.5310—-4 7.8710-6 2.68 9-6 


Table 2. Time [s] required for the matrix-vector multiplication 
(single layer potential). 


n 1 2 3 4 5 
1024 0.01 0.02 0.01 0.01 0.03 
2048 0.02 0.04 0.03 0.05 0.07 
4096 0.05 0.11 0.09 0.12 0.17 
8192 0.12 0.24 0.19 0.26 0.39 
16384 0.27 0.53 0.41 0.56 0.83 
32768 0.57 1.15 0.90 1.23 1.90 
65536 1.18 2.44 1.96 2.73 4.14 
131072 2.45 5.18 4.30 5.89 8.98 


262144 5.15 11.32 9.14 12.95 19.78 
524288 10.68 23.81 19.62 28.02 43.57 


Table 3. Time [s] required for building the H-matrix (single layer 
potential), 


n 1 2 3 4 5 

1024 0.61 0.93 1.76 3.11 5.60 
2048 1.25 2.03 3.85 7.04 12,94 
4096 2.56 4.29 8.41 15.82 29.65 


8192 5.25 9.16 18.10 35.31 66.27 
16384 10.75 19,30 39.32 T147 146.65 
32768 22.15 40.83 85.16 169.16 324.36 
65536 45.79 87.32 185.85 368.46 702.63 

131072 92.64 180.73 387.63 788.06 1511.66 
262144 189.15 378.20 854.75 1775.85 3413.45 
524288 388.96 795.84 1743.66 3596.77 6950.55 


4,9.2 FEM case, inverse stiffness matrix 


We give a short summary of numerical tests from 
Grasedyck and Hackbusch (2003) and consider first the 
Poisson equation —Au = f on the unit square Q = [0, 1]? 
with zero boundary condition u =0 on F = 082. The 
approximate inverse A~! is computed for different local 
ranks k. The_left part of Table 4 shows the relative 
error {I — A A~!|j in the spectral norm for the (formatted) 
inverse on a uniform grid. 

Next we show that the uniformity of the grid and the sim- 
ple shape of the square do not play any role. The grid from 
Figure 4 is strongly graded toward the boundary (‘bound- 
ary concentrated mesh’). For details of the geometrically 
balanced cluster tree, we refer the reader to Grasedyck 
and Hackbusch (2003). The complexity of the inversion 
is reduced compared to that in the uniform case, while the 
accuracy is enhanced (see right part of Table 4). This resem- 
bles the fact that the grid mainly degenerates to a lower 
dimensional structure (the boundary). 

Finally, we give examples showing that the performance 
is not deteriorated by rough coefficients. Consider the 
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Table 4. Relative error |I — A A- Aii in the spectral norm for the (formatted) inverse on a uniform grid (left) and on the boundary 


concentrated mesh (right). 


k  n=4096 16384 65536 262144 | k n=6664 13568 27384 55024 110312 
1 24 8.9 26+1 4.741 L 9,6-2 99-2 70i 1.1-1 9.4-2 
% gl 3.2 L2+1 2.741 2 13-2 11-2 17-2 19-2 1.6-2 
A 92-2 52-1 24 1.041 3 3.9-3 4.4-3 17-3 453 473 
& 2a 99-2 44-1 191 4 86-5 4.7-4 17-4 Soa Si 
$o 23-3 9.2-3 40-2 1.7-1 5 89-6 3.6-5 7.6-6 49-5 5.0-5 
6 64-4 3.7-3 1.8-2 8.4-2 6 21-8 9.8-7 1.2-6 13-6 1.4-6 
T 14-4 69-4 29-3 TE 7 341-10 5.0-7 19-10 58-7 59-7 
8 78-5 3.9—4 1.8-3 7.1-3 8 14-12 42-10 21-11 25-10 28-10 
9 85-6 46-5 21-4 9.4-4 9 10-14 24-13 21-14 27-13 28-13 

15 w 3.3-8 13-7 52-7 

20 IR 13-10  5.3-10_ 2.5-9 
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Figure 4: The boundary concentrated mesh. 


differential equation 


—div(o(x, y) Vu(x, y)) = f(x,y) in Q=[0, 17, 
u=0 onl = dQ (51) 


Let 2, C Q be the wall-like domain from Figure 5. 
Let L, be the differential operator in (51) with o(x, y) = 
fa Œy) E91 | Note that L, = —A. Table 5 shows 


1, (x,y) E€Q\Q, 
the relative accuracy measured for different problem sizes 


n in the Frobenius norm when approximating the inverse 
of the respective FEM matrix by an H-matrix with local 
rank k. The results demonstrate that the error ||A~! — A~!|| 
depends on the jump a very weakly. 


0 


Figure 5. Subdomain 9; of Q = [0, 1]*. A color version of this 
image is available at http://www.mrw.interscience.wiley.com/ecm 


4.10 H?-matrices and other variants of 
H-matrices 


4.10.1 Variable rank, recompression 


We may replace the integer k in Definition 8 by a function 
k(b). Then, the matrix M has to fulfil rank(Ml,) < k(b) 
for all b e P. A possible nonconstant choice is k(b) := 
a *1(b), where i{-) is defined by induction: (b) = 1 for 
leaves b € T;,, and 1(b) = 1 + min{l(@’) : b' € Sry). 
In this case, k(b) varies between 1 and log, n. The low ranks 
correspond to the (many) small blocks, whereas large ranks 
occur for the few large blocks. As a result, cost estimates 
by O(nk’ log? n) for fixed k may turn into the optimal order 
O(n) for appropriate variable rank. 

Given an H-matrix M with a certain (variable or con- 
stant) local rank, it might happen that the block matrices 
Mi, can be reduced to lower rank with almost the same 
accuracy. The standard tool is a singular-value decompo- 
sition of M|,. If some of the k(b) singular values o; > 
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Table 5. Frobenius norm ||A~! — A~1|j of the best approximation to A7! using the local rank k. 


n = 2304 k=1 k=2 k=3 k=4 =s k=6 
Storage (MB) 10.2 18.9 27.6 36.2 

A 4.110-03 5.9y9-04 1.110-05 1.210-06 

Le 6.919-03 9.819-04 16-05 21-06 

Lio 6.910-03 9.810-04 1.610-05 1.7 9-06 

n = 6400 

Storage (MB) 40.0 75.9 111.6 147.5 183.1 218.8 
A 3.519-03 6.510-04 8.810—06 2.110-06 4.210-07 8.319-09 
Eg 5.510-03 1.010-03 1.210-05 3.210-06 5.519-08 1.319-08 
Lios 5.610-03 1.010-03 1.219-05 3.110-07 4.7\9-08 9.110-09 
n = 14400 

Storage (MB) 123.4 235.7 349.6 462.0 575.9 688.2 
A 3.259—03 5.910-04 8.91006 2.310-06 5.54908 1.5,9—08 
Lip 4.919-03 8.810-04 1.210-05 3.310-06 7.310-08 1.910-08 
Lio 5.010-03 8.81004 1.010-05 3.210-06 6.710-08 9.119-09 


+++ Z Ok) are sufficiently small, these contributions can 
be omitted resulting in a smaller local rank k(b). 


4.10.2 Uniform H-matrices 


Consider b = t x o €T,,,,. The submatrix M|, belongs 
to R™*°. A special subspace of R**° is the tensor 
product space V, @ W, = {vw" : v € V,, w € W,} of V, € 
R” and W, € R°. Note that T € V, @ W, implies rank T < 
min{dim V,, dim W,}. Hence, we may replace the condition 
rank(M]|,) < k by Ml, € V, ® W, with spaces V,, W, of 
dimension < k. The resulting subset of H-matrices is called 
the set of uniform H-matrices. 

For the representation of submatrices M|,, one uses 
corresponding bases {V,,.-., Vaim vid and {w,,..-5 Waim wp) 
of V,,W, and defines V; :=[¥,.--,Vaimy,], Wa = 
[Wi -> Wim w,]- Then, M|, = V,S,W,, where the matrix 
S, of size dim V, x dim W, contains the specific data of 
Mi, that are to be stored. 


4.10.3 H? -matrices 


The previous class of uniform 7-matrices uses different 
spaces V,, W, for every b= 1 x o € P. Now, we require 
that V, depends only on t, while W, depends only on o. 
Hence, we may start from a family V = (V_),<7, of spaces 
V, C R" and require M|; € V, Q V, forallb=txoe P. 
The second, characteristic requirement is the consistency 
condition 
Vile E Vy forall teT;andt es) (52) 
that is, v| € Vy for all v e V,. Let V, and V, be the 
corresponding bases. Owing to (52), there is a matrix B, , 


such that V,|_, = V_B,,,. Thanks to Definition 5c, V, can 
be obtained from {V,, By: V € S(t)}. Hence, the bases 
V, need not be stored, instead the transformation matrices 
By « are stored. This is an advantage, since their size is 
ky x k, with k, := dim V, < k independent of the size of 
the blocks b € P. 

For details on H?-matrices, we refer to Bérm and 
Hackbusch (2002) and Hackbusch, Khoromskij and Sauter 
(2000). The latter paper considers, for example, the 
combination of the #?-matrix structure with -variable 
dimensions k,. In Bérm, Grasedyck and Hackbusch (2003), 
the example from Section 4.9.1 is computed also by means 
of the #ĉ-matrix technique and numbers corresponding to 
Tables 1 to 3 are given. They show that a slightly reduced 
accuracy is obtained with considerably less work. 


4.11 Applications 


We mention three different fields for the application of H- 
matrices. 


4.11.1 Direct use 


In the case of a BEM matrix A, the storage of the n? matrix 
entries must be avoided. Then the approximation of A by 
an H-matrix A € H (k, P) reduces the storage requirements 
to almost O(n). Since in this case, A must carry all 
information about the BEM problem, the rank k must be 
chosen high enough (e.g. k = O(log )) in order to maintain 
the accuracy. Second, the matrix-vector multiplication can 
be performed with almost linear cost. 

For the solution of the system of linear equations Ax = b, 
one has two options: (a) Use an iterative scheme that is 
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based on the matrix-vector multiplication (cg-type meth- 
ods, multigrid), (0) Compute the approximate inverse A7! 
(see Section 4.8.4). 

The computation of operators like the Steklov operators 
(Neumann-to-Dirichlet or Dirichlet-to-Neumann map) 
requires performing the matrix—matrix multiplication © 
from Section 4.8.3. 


4.11.2 Rough inverse 


In the FEM case, the problem data are given by the 
sparse stiffness matrix A. The approximate inverse A-} 
must be accurate enough, if ¥:= A-'b is to be a good 
approximation of the solution x. However, there is no need 
to have A~} very accurate. Instead, one can use A~! as 
‘preconditioner’: The iteration 


xl t= x — Al (Ax — b) 


can be applied to improve x° := Ab. The convergence 
rate is given by the spectral radius of I — A~!A (cf. Hack- 
busch, 1994), An upper bound of the spectral radius is the 
norm ||I— A~!A|| which should be < 1. For the elliptic 
cxample, this norm is given in Table 4. 


4.11.3 Matrix-valued problems 


There are further problems, where the usual matrix—vector 
approach is insufficient, since one is interested in matrices 
instead of vectors. We give some examples that can be 
solved by means of the H-matrix technique. 


Matrix exponential function 

Matrix functions like the matrix exponential can be com- 
puted effectively by use of the Dunford—Cauchy represen- 
tation 


exp(A) = = f exp(z)(zI — A)™! dz (53) 


where I is a curve in the complex plane containing the 
spectrum of A in its interior. Approximation of the integral 
by a quadrature rule (z,: quadrature points) leads to 


N 
exp,(A) = D ea,(z,I— A) (54) 


v=—N 


Since the integration error decreases exponentially with 
respect to N, one may choose N = O(log? 1/2) to obtain 
an integration error £. The resolvents (z,I — A)~! are com- 
puted due to Section 4.8.4. For further details, we refer 
to Gavrilyuk, Hackbusch and Khoromskij (2002). 


Lyapunov equation 

There are linear equations for matrices. An example is the 
Lyapunov equation AX + XB + C = O for the unknown 
matrix X, while A, B, C are given. One possible solution 
uses the representation X = fy" e’ACe’B dr, provided that 
the eigenvalues of A, B have negative real parts, Since the 
dependence of exp,,(A) on ¢ in (54) is expressed by the 
scalar factor e™%, one can replace e'^, e" by exp,,(A) and 
exp,,(B) and perform the integration exactly (cf. Gavrilyuk, 
Hackbusch and Khoromskij, 2002, Section 4.2). 


Riccati equation 
For optimal control problems, the (nonlinear) Riccati equa- 
tion 


A'X+XA—XFX+G=0 
(A, F,G given matrices, X to be determined) 


is of interest. In Grasedyck, Hackbusch and Khoromskij 
(2003), the direct representation of X by means of the 
matrix-valued sign function is applied. Its iterative com- 
putation requires again the inversion, which is provided by 
the H-matrix technique. 
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1 INTRODUCTION 


Domain decomposition (DD) methods have been developed 
for a long time, but most extensively since the first interna- 
tional DD conference held at Paris in 1987. This concerns 
both the theory and the practical use of DD techniques for 
creating efficient application software for massively parallel 
computers. The advances in DD theories and applications 
are well documented in the proceedings of the annual 
international DD conferences since 1987 and in numerous 
papers (see also the DD home page http: //www.ddm.org 
for up-to-date information about the annual DD confer- 
ences, recent publications, and other DD activities). Two 
pioneering monographs give an excellent introduction to 
DD methods from two different points of view: the more 
algebraic and algorithmic one (Smith, Bjgrstad and Gropp, 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


1996) and the more analytic one (Quarteroni and Vali, 
1999). We refer the interested reader also to the survey 
articles Xu (1992), Le Tallec (1994), Chan and Mathew 
(1994), and Xu and Zou (1998). We start our chapter 
with a brief look at the DD history. In this introductory 
section (Section 2), we provide an exciting journey through 
the DD history starting in the year 1869 with the classi- 
cal paper by H.A. Schwarz on the existence of harmonic 
functions in domains with complicated boundaries, contin- 
uing with the variational setting of the alternating Schwarz 
method by S.L. Sobolev in 1934, looking at the classical 
finite element (FE) substructuring, or superelement tech- 
nique intensively used by the engineers in the sixties, and 
arriving at advanced domain decomposition methods devel- 
oped mainly during the last 15 years. It is worth mentioning 
that the classical FE substructuring technique has its roots 
in calculation methods used in structural mechanics for a 
long time. 

Section 3 gives an introduction to the Schwarz theory 
(now called Schwarz machinery) that provides a unique 
framework for constructing and analyzing additive and 
multiplicative Schwarz methods (preconditioners). Many 
domain decomposition and multilevel methods (precondi- 
tioners) can be put into this framework. Throughout this 
chapter our model objects are symmetric and positive defi- 
nite (SPD) systems of algebraic equations typically result- 
ing from finite element discretizations of elliptic problems, 
such as the heat conduction equation, the potential equation, 
and the linear elasticity equations. However, the algorithms 
and some of the results can be extended to more general sys- 
tems, including systems with indefinite and nonsymmetric 
system matrices. 

In Section 4, overlapping DD methods, which first appe- 
ared in Schwarz’s original paper in their multiplicative 
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version, are considered. We pay main attention to additive 
versions, since these are more suitable for massively paral- 
lel computers. Typically, the main components of advanced 
(two-level) overlapping DD methods are independent local 
solvers for Dirichlet problems on overlapping subdomains 
and a coarse space (grid) solver. The aim of the latter 
is to avoid strong dependence of the convergence rate 
(relative condition number) on the number of overlap- 
ping subdomains. A proper choice of the overlap is very 
important. The understanding of these influential factors 
is crucial for constructing efficient overlapping DD meth- 
ods (preconditioners). We present the basic algorithms, look 
at the condition number estimates, and give an overview 
over some versions, including multilevel overlapping DD 
methods. 

Section 5 is devoted to nonoverlapping DD methods, 
certainly most interesting for applications. This type of 
DD methods reflects the classical substructuring finite ele- 
ment technique, where the global sparse finite element 
system is reduced to a much smaller but denser system 
(the so-called Schur-complement or interface problem) by 
condensation of unknowns that are internal for each sub- 
structure. Iterative solvers for the interface problems, for 
example, the conjugate gradient method, are usually most 
efficient. Together with a good preconditioner they reduce 
the cost and provide parallelization, in part, by avoiding 
assembling the Schur complement. Section 5.2 concentrates 
on various Schur-complement preconditioners. However, 
the computation of contributions to the Schur complement 
from the substructures, even without assembling, may, in 
practice, be more time and memory consuming than direct 


solving procedures for the original system. For this rea- 
son, modern DD algorithms completely avoid the use of 
Schur complements and use only their preconditioners. 
Apart from Schur-complement preconditioning, iterative 
DD methods require efficient solvers for the substructure 
finite element problems, Dirichlet or others, at each iter- 
ation step. They are termed here as local problems. Fast 
direct solvers are hardly available for interesting applica- 
tions, whereas there exists a variety of well-developed fast 
iterative solvers, which are easily adapted to local prob- 
lems in h-versions of the finite element method (FEM). 
They cover many specific situations, for example, sub- 
domains of complicated and specific shapes, orthotropies, 
and so on. The implementation of local iterative solvers 
as inexact solvers may be most efficient, but their use for 
solving the local Dirichlet FE subproblems arising in the 
so-called extension (prolongation) and restriction operations 
(from and to the interface, resp.) is very delicate. How- 
ever, if these procedures are based on bounded discrete 
extensions (prolongations) and their transposed operations 
(restrictions), then stability can be proven. This and other 
topics related to the inexact iterative substructuring are 
discussed in Section 5.3. We present the (balanced) Neu- 
mann—Neumann method as a special Schur-complement 
preconditioning technique in Section 5.4, and proceed with 
the finite element tearing and interconnecting (FETI) and 
Mortar methods in the next two sections, The FETI method 
requires a conform triangulation, whereas the- FE sub- 
spaces are separately given on each substructure including 
its boundary. The global continuity is then enforced by 
Lagrange multipliers, resulting in a saddle-point problem 


Algorithm 1. Alternating Schwarz Method for solving the boundary value problem (3)—(4). 


u? € C2(Q) N C(@) given initial guess: u? = 0 on AQ 


for n = 0 step 1 until Convergence do 
First step: 


{initialization} 
{begin iteration loop} 


{update in 22,} 


Define #"*+¥/? € C7(Q,) NC@,): — AR? = f in Qy, Ut? = u” on AQ, 


utr) a W*'2(x) 
u”+1/2(x) = u"(x) 


Second step: 


vx € Qy, 
vx € 2 \ 9. 


{update in {2%} 


Define #"*) € CHQ) NC(Q,): -ATH = f in R, BH = u"t!2 on AQ, 


u”+! (x) = at (x) 
u” t(x) N u”+1/2(x) 


end for 


Yx €Q), 
Wx € M \ &. 


{end iteration loop} 


that can be solved via its dual problem. In Mortar methods, 
additionally the conformity of the triangulation across the 
subdomain boundaries is skipped, making the method very 
flexible. 

Let us mention that we are mainly looking for asymp- 
totically (almost) optimal DD preconditioners (solvers) in 
the sense that the memory requirement and the arithmeti- 
cal costs should be (almost) proportional to the number of 
unknowns. In this connection, we sometimes also speak 
about linear complexity preconditioners (solvers). 

Finally, let us mention that this contribution cannot 
cover all aspects of domain decomposition techniques 
and their applications. For instance, the p- and the kp- 
versions of the FEM have some specific features that are 
not discussed in this contribution in detail; see Chap- 
ter 5, Chapter 6 of this Volume and Chapter 3, Vol- 
ume 3 for more information on these topics. Other dis- 
cretization techniques like the boundary element methods 
(BEM) are also not discussed in this paper (see Chap- 
ter 12 and Chapter 21 of this Volume). The coupling 
of FEM and BEM is naturally based on DD techniques 
(see Chapter 13, this Volume). We also refer to the 
corresponding publications, which have mostly appeared 
quite recently. The field of the application of DD meth- 
ods is now very wide. Here, we especially refer to the 
proceedings of the annual DD conferences mentioned 
above. 
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2 DOMAIN DECOMPOSITION HISTORY 


Schwarz (1869) investigated the existence of harmonic 
functions in domains & with complicated boundaries 0&2: 
Given some boundary function g, find a function u such 
that 


—Au(x) =0 VWxEQ a) 
u(x) = g(x) Vx € dQ (2) 


The ‘complicated’ domain 2 = 22, U Q, is supposed to be 
the union of two simpler overlapping domains 2, and Q,, 
where the existence of harmonic functions is known (see the 
left part of Figure 1 for a sketch of Schwarz’s original draw- 
ing where the simpler domains are a rectangle and a circle). 
H.A. Schwarz proposed an iteration process for solving 
(1)-(2) where’one has to solve alternately similar problems 
in Q; and 925. Without loss of generality (homogenization 
of Dirichlet’s boundary conditions), we will explain this 
Alternating Schwarz Method for the Poisson equation under 
homogeneous Dirichlet boundary conditions: Given some 
continuous function f € C(Q), find a twice continuously 
differentiable function u € C?(2) N C(Q) such that 


—Au(x) = f(x) VxeEQ (3) 
u(x) =0 Vx € 92 (4) 


Algorithm 2. Variational Alternating Schwarz Method for solving (3)—(4). 


u? € Va = HY (2) = V, + V, given initial guess 


for n = 0 step 1 until Convergence do 


First step: 


{initialization} 
{begin iteration loop} 


{update in V,} 


uth? = uh 4 wt? with wh? eV, = MAC) C Vp: 


[ Vut. vod = | Fayv(a) ax ~ f Vu". Vudx Ye, (5) 
Qi Qi Qı 


Second step: 


{update in V,} 


uH = ynt 4 whl, with w"t EV, = MEC) C Vo: 


Í Vw”t! . Vy dr = f f ()u(x) dx — [ wut? Vode We, (6) 
2 Q Q 


end for 


{end iteration loop} 
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Figure 1. Overlapping and nonoverlapping DD samples. 


Algorithm 1 now describes the Alternating Schwarz 
Method for solving the boundary value problem (3)-(4) 
as a model problem. The convergence analysis was done 
by Schwarz (1869) using the maximum principle (see also 
Nevanlinna, 1939), 

Sobolev (1936) gave the variational setting of the Alter- 
nating Schwarz Method for solving linear elasticity prob- 
lems in the form of an alternating minimization procedure 
in Q, and Q,. Algorithm 2 provides the corresponding 
variational formulation that is nothing else but the weak for- 
mulation of the Alternating Schwarz Algorithm 1. Sobolev 
(1936) proved convergence of the Variational Alternating 
Schwarz Algorithm 2 in L, to the weak solution of the 
boundary value problem (3)—(4) provided by its variational 
formulation: Find u € V} = HÌ(&) such that 


alu, v) =(f, v) Wwevy 1) 


where the bilinear form a(., -): Vo x Vy — R and the linear 
form (f, :): Vy —> R are defined as follows: 


atu, v) = Í Vu - Vv dx 
Q 
(fv) = Í F(x)u(x) dx 8) 


Mikhlin (1951) showed uniform convergence in every 
closed subdomain of the domain &. Since then, the Schwarz 
Alternating Method was studied by many authors (see, for 
example, Morgenstern, 1956 and Babuska, 1958). 

As just mentioned, the solution of the variational equa- 
tions (5) and (6) in Algorithm 2 is equivalent to the 
alternating minimization of the energy functional E(-) = 
(1/2)a(-, -) — (f, -) in V, and V, respectively, that is, 


E(u") = min E(u” + w) 
wev; 
and 
E(u"*}) = min E(u”+!/2 4 w) (9) 
wey, 


Moreover, if we introduce the orthoprojections P;: V > V, 
by the identities 


a(P;u, v) =a(u, v) Wev,, YuevV (10) 


then we immediately observe from (5) and (6) that w”+!/2 = 
P (u — u”) and w"t! = P,(u — u”+!/2) respectively. Thus, 
the iteration error z” = u — u” satisfies the recurrence 
relation 


zt! = (I — Py) — Pz" = (U — Py — P, + P,P)” 

ai) 
that is nothing but an alternating orthoprojection of the 
iteration error to V$ and V+. This alternating projection 
procedure can obviously be generalized to a decomposi- 
tion of V into many subspaces, to finite element subspaces, 
to othcr problems, and so on. Owing to the multiplica- 
tive nature of the error transition, these kind of alternating 
projection procedures are nowadays called multiplicative 
Schwarz methods (MSM). The main drawback of the MSM 
‘is connected with the sequential character of this procedure 
that makes the parallelization difficult. To overcome this 
drawback, additive versions of Schwarz algorithms were 
proposed. These observations led to the modern theory of 
Schwarz methods that has been developed during the last 
15 years (see Section 3). 

The substructuring technique developed by mechanical 
engineers for the finite element analysis of complex 
structures in the sixties (see e.g. Przemieniecki, 1963) is 
usually recognized for the other main root of the modern 
numerical DD algorithms. In the classical finite element 
substructuring technique, the computational domain Q 
is decomposed into J nonoverlapping subdomains 
(substructures) Q; (j = 1,2,..., J) such that & = Ujat P; 
and 2,09; =@ for i # j, and each subdomain Q; is 
divided into finite elements 8, such that this discretization 
process results in a conform triangulation of Q. In the 
following, the indices ‘C’ and ‘J’ correspond to the nodes 
belonging to the coupling boundaries (interfaces, skeleton) 
To = Ujer 92; \ Tp and to the interior 2, = Uj, 2, of 
the subdomains, respectively, where I’, is that part of dQ 
where Dirichlet-type boundary conditions are given (see 
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Algorithm 3. Classical substructuring algorithm for solving (13). 


i, = K7'f}, that is, solve K,@, = f; 
gc =f ~ Ket, 

Sc =K,- Ko Ky 'Kic 

üç = Sc'c 

u; = 8, — K7 Kuc 


{elimination of the internal unknowns in parallel} 


{forming of the right-hand side} 
{forming of the Schur complement} 
{solving the Schur-complement problem} 


{determination of the internal unknowns in parallel} 


also the right part of Figure 1), Boundaries with natural 
(Neumann, Robin) boundary conditions will be handled as 
coupling boundaries. 

Let us define the usual FE nodal basis 


d =[%¢, 9] 


= [br sees Ones PN t> ONc+Ny genes bvancen, | 

(12) 
where the first Nç basis functions belong to Ic, the 
next N; to Q, the next N; to Qo, and so on, with 
N; =N n +N, +e +N. The FE space V=V, is 
obviously a finite-dimensional subspace of the variational 
function space Vy. Once the FE basis ® is chosen, the 
FE scheme leads to a large-scale sparse system Ku = f 
of finite element equations with the SPD stiffness matrix 
K provided that the bilinear form has the corresponding 
properties. Owing to the arrangement of the basis functions 
made above, the FE system can be rewritten in the block 


form 

(ce S 

Ke K u; f; 

where K; = diag(K; );=1,2,.7 is block diagonal. The 
block diagonal entries K,, are of the dimension N; x 
Ny and’ arise from the FE approximation to the PDE 
considered in Q; under homogenous Dirichlet boundary 
conditions on 80;. Owing to the block diagonal structure 
of K,, one can eliminate the internal subdomain unknowns 
u; in parallel by block Gaussian elimination. Solving 
the resulting Schur-complement problem, we obtain the 
coupling node (interface) unknowns that allow us to 
define the internal subdomain unknowns finally. This 
classical FE substructuring algorithm is described in 
detail by Algorithm 3. The classical FE substructuring 
Algorithm 3 is well suited for parallel implementation, 
but very expensive with respect to the arithmetical 
operations, especially the forming of the Schur-complement 
matrix $ç is very time consuming. If an iterative 
solver is used for solving the Schur-complement problem, 
the forming of the Schur-complement matrix can be 
avoided because only the miatrix-by-vector operation 


Scxu% is required. This leads us to the iterative 
substructuring methods, which are the starting point 
for the modern nonoverlapping domain decomposition 
methods discussed in Section 5. Iterative nonoverlapping 
DD algorithms and their mathematical studies appeared 
in the seventies. The first consistent analysis of the role 
of Poincaré-Steklov operators (the operator analogue of 
the Schur-complement matrix Sç) in such algorithms 
was presented in the book by Lebedev and Agoshkov 
(1983). 


3 FUNDAMENTALS OF SCHWARZ’S 
METHODS 


3.1 Preliminaries 


Let us consider a symmetric, elliptic (coercive), and 
bounded (continuous) abstract variational problem of the 
following form: Given f € Vj, find u € Vp such that the 
variational equation 


atu, v) = (fv) YueVo (14) 
holds for all test functions from some Hilbert space 
Vo equipped with the scalar product (-, -)y, and the 
corresponding norm || - |ly,. The bilinear form a(.,-): Vg x 
V, — R is supposed to be symmetric, that is, 


alu, v) =a(v,u) Vu,v € Vo (15) 


Vp-elliptic (Vp-coercive), that is, there exists some positive 
constant u; such that 


wlolh caw, v) Woe Vp (16) 


and V,-bounded (V-continuous), that is, there exists some 
positive constant p.. such that 


alu, v) £ valle lly lv llyg Vu, v € Vo (17) 
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The value of the bounded (continuous) linear functional 
f from the dual space Vj at some v € Vy is denoted 
by (f,v). Sometimes (-,:):VgxVo—~R is called 
duality product. Owing to Lax—Milgram’s lemma, the 
Vo-ellipticity (16) and the Vp-boundness (17) ensure 
the existence and uniqueness of the solution of the 
abstract variational problem (14), (see e.g. Ciarlet, 
1978). 

Abstract variational formulations of the form (14) cover 
a lot of practically very important formally self-adjoint, 
elliptic boundary value problems. The Dirichlet bound- 
ary value problem for the Poisson equation introduced 
in Section 2 is certainly the most prominent representa- 
tive of this class. Other representatives are the stationary 
heat conduction equation (Example 1), the linear elas- 
ticity problem (Example 2), linearized mechanical prob- 
lems, and linear magneto- and electrostatic boundary value 
problems. 


Example 1 Stationary heat conduction problem. Given 
some heat source intensity function f € L,(), find the 
temperature field u € Vp = H} (Q) such that the variational 
equation (15) holds with the bilinear form a(.,-): Vo x 
Vo — R and the linear form (f, -): Vo > R defined by the 
identities 


alu, v) = [ a(x) Vu(x) - Vu(x) dx 
and E 


(fv) = iL Fa)u(a) dx a8) 


respectively. As everywhere else in this chapter, if it is 
not defined otherwise, the computational domain 2 € R? 
(d = 1,2,3) is assumed to be bounded and sufficiently 
smooth (e.g. with a Lipschitz boundary). The given heat 
conduction coefficient «(-) is supposed to be uniformly pos- 
itive and bounded. The symmetry of the bilinear form is 
obvious. The V,-ellipticity (16) and the V9-boundness (17) 
directly follow from the Friedrichs and Cauchy inequali- 
ties, respectively (see e.g. Ciarlet, 1978). Here, we consider 
only homogeneous Dirichlet boundary conditions (vanish- 
ing temperature u on the boundary 92 of 2). Other bound- 
ary conditions (Neumann, Robin, mixed) can be treated 
in the same way. In many practical cases, the coefficient 
of conductivity has significant jumps, so that the ratio 
p-2/1, is very large. This requires DD algorithms which 
are robust with respect to 2/1, (see Section 5 for robust 
DD methods). We mention that the heat conduction equa- 
tion formally describes many other stationary processes, 
like diffusion or filtration in porous media with variable 
permeability. 


Example 2 The static linear elasticity problem. Given 
volume forces f = (fj,..-, fa)" in Q and surface tractions 
t= (ts... fy)" om some part Ty =92\Tp of the 
boundary 9, find the displacement u = (w,....,u4,)7 € 
Vo = (v= (y,..., ug)? : v; €H(Q), v; = 0 on Tp, i = 
1,...,d} of the elastic body Q C R? (d = 2, 3) clamped 
at Tp such that the variational equation (15) holds with 
the bilinear form a(., -): Vo x Vo —> R and the linear form 
(f,-): Vo > R defined by the identities 


i,j,kl=1 


d 
acer)= f E ey) Duu a9 
and f 


d d 

as [Erana | Yn yeas 
i=l N i=l 

(20) 


respectively, where the s;; = (1/2)(0u,/0x; + 0u,/0x;) de- 
note the linearized strains. The given matrix D(x) = 
(Djjx(%)) of the elastic coefficients is supposed to be 
symmetric, uniformly positive definite, and uniformly 
bounded. These assumptions together with Korn’s, 
Friedrichs’, and Cauchy’s inequalities ensure the symmetry 
(15), Vo-ellipticity (16) and Vo-boundness (17) of the 
bilinear form. If the volume forces and surface tractions 
are chosen in such a way that the corresponding linear 
functional (20) is continuous on Vp, then the Lax—Milgram 
lemma again provides existence and uniqueness of the 
solution of the static linear elasticity problem (see e.g. 
Ciarlet, 1978 and Korneey and Langer, 1984). 


We now approximate the abstract variational equation 
(14) by some FE Galerkin scheme. Let V, be some finite- 
dimensional FE subspace of the space Vy spanned by some 
basis ©, := [b1; $2, ..., dy, ], that is, V, = span®, C Vo, 
where k denotes some usual discretization parameter such 
that the number of unknowns N, behaves like O(h~4) ash 
tends to 0. Here and in the following, we assume that the FE 
discretization is based on some quasiuniform triangulation. 
Note that we use ©, as symbol for the set of basis functions 
{j]j1,...v, as Well as for the FE-Galerkin isomorphism 
(uj, <> Uy) 


Ni, 


up, = Oy, = > uih; : (21) 


i=l 


mapping some vector of nodal parameters t, = (t; );=1,.., N, 
€ R™ to the corresponding FE function u, € V,. Now the 
FE-Galerkin solution of the variational equation (14) is 
nothing but the solution of (14) on the FE subspace V,: 
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Given f € V5, find u, € V, such that 
au, Vp) = (f, Vn) Vo, E Yh (22) 


Once the basis ©, is chosen, the FE scheme (22) is 
equivalent to the following system of FE equations: Find 
the nodal parameter vector u, € R%* corresponding the 
FE solution u, by the Galerkin isomorphism (21) as the 
solution of the system 


K,u, =f, (23) 


where the stiffness matrix K, and the load vector f, are 
generated from the identities 


(K,u,, Yp) =a(®,u,, P,V;,) 
=a(u,,V,) Yip Up 4> Up Vp ER (24) 


and 


(En Ya) = (F, Yn) Yop <> Yp ERM (25) 


respectively. Here (f,,, v4) = (fp, Vp) ev = ff Vp denotes the 
Euclidean scalar product in R™*. 

Tn order to simplify the notation, we skip the subscript 
h in the following. Since we are primarily interested in 
the solution of the FE equations, there will be no confu- 
sion of some FE function u = u, € V = V, and functions 
u from the space Vy. Let us now assume that the FE 
space 


J 
v=V,=>Y, (26) 
j=! 
can be split into a (not necessarily direct) sum of the J 
subspaces 


V; = span, = spanoV,, f=1,2,...,J 7 


where the basis Y, = [Wj1, Wz -> Win] = PV, of the 
subspace V, is obtained from the original basis ® by the 
N x N, basis transformation matrix V,. Therefore, N; = 
dimV, = rank V, and X; N; = N. 

The orthoprojection P, : V —> V; of the space V onto its 
subspace V, with respect to the energy inner product a(-, -) 
is uniquely defined by the identity 


a(P;u,v,;)=a(u,v;,) Yv eV, Vue Vv (28) 
Sometimes the orthoprojection P, is called the Ritz, or 
energy projection. As orthoprojection, P, = Ps is self- 


adjoint with respect to the energy inner product, that is, 


a(P;u, v) = alu, Pv) Vu,veV (29) 


and satisfies the projection relation P? = P,. It is easy to 
see from (24) and (27) that the orthoprojection P;u of some 
u <> u can be computed by the formula 


= ime {3 -1 
P,u = OV u; = V; (VFKV) VKo (30) 


that is, u; is obtained from the solution of a smaller 
system with the N; x N; system matrix V/KV, and the 
right-hand side V;Ku. Similarly, replacing the original 
energy inner product a(-,-) on the left-hand side of the 
identity (28) by some individual symmetric, V,-elliptic 
and V,-bounded bilinear form a,(-, X:N V, x V; > R, we 
define some projection-like operator P: V =% V; that is 
self-adjoint but in general P? Æ P,. Thus, P, is not an 
orthoprojection. Again, P,u can be easily calculated by the 
formula 


Doi as =iyT 
Pu = Vu; = &V,C; V; Ku (31) 


where the N, x N, matrix C, is generated from the iden- 
tity 


(Cpu, v) = a;(uj,v;) Vuj, v; <> uj, vj € RY (32) 


in the same way as K was generated above from the original 
bilinear form a(-, +). 


3.2 Schwarz methods and preconditioners for 
elliptic variational problems 


In the next three sections, we describe various Schwarz 
algorithms and the corresponding preconditioners. We men- 
tion that the Schwarz algorithms are completely defined 
by the space splitting (26), the subspace bilinear forms 
a,(-,+), and the arrangement of the projection-like opera- 
tions. Finally, we present some interesting examples. 


3.2.1 Additive algorithms 


The (inexact) Additive Schwarz Method (ASM) corre- 
sponding to the space splitting (26) and to the subspace 
bilinear forms a,(-,-) can be written in the form of an 
iteration process in the FE space Y (function version) 
and in R^ (vector/matrix version) as shown in Algo- 
rithm 4. Replacing a,(-,-) by a(-,-), C; by V}KV,, and 
P; by P; in Algorithm 4, we arrive at the so-called 
exact ASM. In this sense, we consider the exact ASM 
as special case of the inexact ASM presented in Algo- 
rithm 4. 
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Algorithm 4. Inexact ASM for solving Ku = f (function <> vector/matrix versions). 


u? = ou? € V given initial guess and t given iteration parameter 


for n = 0 step 1 until Convergence do 


for all j € {1,..., J} in parallel do 


{initialization} 
{begin iteration loop} 


{parallel computation of the subspace corrections} 


wi = OV wi eVi: a (w, v) = (f, v) — alu”, v) Sa(u—u",v,) Vo, eV; 


t 
wre R™: 
end for 


Cyw? = Vi(f- Ku") = Vila" 


{updating the old iterate} 


J J 
TE Bo a 
unt aw +t} w} =u" +1) P u-u") 


j=l j=l 


t 


J 
n+l 
att = w+ > Vv, wi 
j=l 
end for 


{end iteration loop} 


The iteration error z” = u — u” € V obviously satisfies 
the error iteration scheme 


J 
z+! = (I = tÊ) = (: _ DE (33) 


j=l 


with the (inexact) ASM operator P = bai Bi, Mention 
that the error iteration scheme (33) completely defines the 
ASM iteration scheme given by Algorithm 4. The ASM 
operator P = P* > 0 is self-adjoint and positive definite 
with respect to the energy inner product a(., -). Taking the 
energy norm |j- ||, = /(a(-,-)) of the right-hand side of 
(33), we arrive at the iteration error estimate 


iz", <  — tPliglle"lle (34) 


Therefore, the convergence rate of the ASM iteration is 
completely defined by the (operator) energy norm ||J — 
tP||, of the ASM iteration operator J — wP (see (38) below 
for rate estimates). 

From the matrix representation of Algorithm 4, we 
immediately see that the ASM is nothing more than the 
Richardson iteration 


‘Cut — u") + Ku" =f (35) 


preconditioned by the inexact ASM preconditioner 
J 
f a 
c =} vcv] (36) 
j=l 


where Cy! is replaced by (VFKV,)™ in the exact version. 
The error propagation scheme of the Richardson iteration 
immediately gives the iteration error estimate 


er yg = y EZH, a) py < |I — tC Kg lz" lh 
(37) 
where z” =u — u” again denotes the iteration error as 
vector in R”. The K-energy norm error estimate (37) is the 
vector counterpart of the iteration error estimate (34). For 


t € (0, 2/h,,,,), or T € (0, 2/7), we get the rate estimates 
IZ - tC Kile = M - tP 
= max{|1 — Thminl {1 — Trax} 
< q(t) = max{|1 — tyl, |1 — t7} <1 (38) 
provided that the minimal eigenvalue 13, = Amin 


(C71K) = Xpin(P) > y and maximal eigenvalue Amar = 


rma (C7!K) = Ama (P) < F for C7 1K resp. P, or at least 
good lower and upper bounds y and F are known. The 
lower and upper bounds of the eigenvalues of C~!K resp. 


P satisfy the so-called spectral equivalence inequalities 
y(Cy, v) < By, v) s (Cv, v) we RY G9) 


which are in turn used to determine the bounds y and 
y. In the following, we frequently use the short nota- 
tion yC < K < YC for the spectral equivalence inequalities 
(39). Mention that the spectral equivalence inequalities (39) 
are equivalent to the inequalities 


yaw, v) <a(Pv,v) Faw, o) VoeV (4 
and 
ya(P-'v, v) <a(v,v) < Falu, v) WeEV (41) 


in the FE space V. The optimal convergence rate 


yey 
fop = Tope) = y+¥ <1 (42) 


is attained at the optimal iteration parameter t = Toy = 


2/(y +Y). Sometimes it is useful to know that there also 
holds the iteration error estimate 


ie ee 
lz"! icone = (wt, a) pw < gliz" ige- (43) 


in the KC~!K-energy norm with the same q(t) as above, 
where d” = Kz" =f — Ku" and w” = C-'d” denote the 
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defect and the preconditioned defect (correction), respec- 
tively. In contrast to the K-energy norm iteration error 
estimate (37)—(38), the KC~!K-energy norm estimate (43) 
is computable and can be used as convergence test in Algo- 
rithm 4. 

However, in practice, we use the conjugate gradient (CG) 
acceleration instead of the Richardson iteration, that is, we 
look at the ASM as a technique for constructing additive, 
and therefore highly parallelizable preconditioners of the 
form (36) by space splitting and subspace preconditioning! 
The preconditioning step w = C~1d in the preconditioned 
conjugate gradient (PCG) method with the ASM precondi- 
tioner (36) is nothing but one iteration step (# = 1) of the 
Algorithm 4 applied to Ku = d (i.e. f = d) with t = 1 and 
the zero initial guess u? = 0 giving w=u'. 


3.2.2 Multiplicative algorithms 


The Multiplicative Schwarz Method (MSM) has its histor- 
ical roots in the Additive Schwarz Method, as discussed 
in Section 2. The inexact version of the MSM correspond- 
ing to the space splitting (26) and to the subspace bilinear 
forms a;(-,-) can be written in the form of an itera- 
tion process in the FE space V (function version) and in 
RY (vector/matrix version), as shown in Algorithm 5. We 
again consider the exact MSM (a,(-, J) a,-), C= 
VIKV;, P, := P,) as a special case of the inexact version 
presented in Algorithm 5. 


Algorithm 5. Inexact MSM for solving Ku = f (function <> vector/matrix versions). 


u? = Qu? € V given initial guess 
for n = 0 step 1 until Convergence do 


for j = 1 step 1 until J do 


{initialization} 
{begin iteration loop} 


{successive computation of the subspace corrections} 


a ae ve a 
with re ov wrt M Vj: 4 (wp! `À v) =(f,v) -a (utu DN, v) wu, eV; 


$ 
wits) ER: 


n+ D _ wT (fe HG- = +HG-D/F] 
Cw = VI (£ — Ku" 0-9/3) = vpartt 


{immediate updating of the iterate} 


yPt UID) = yH -DAI y a = u” tlG-D/ P, (u = yr tG -=D/7) 


w+) = HGD) 4 yaw 
end for 


end for 


{end iteration loop} 
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The iteration error Z” = u — u” € V now satisfies the 
error iteration scheme 


ghtl = Ez" (44) 
resulting in the energy norm iteration error estimate 
He" la < WE Mall2" lla (45) 


with the MSM error propagation operator (iteration opera- 
tor) 


E=U-Pp)U-P,)--U-B) ao 


Therefore, the convergence rate of the MSM iteration is 
completely defined by the (operator) energy norm |j£||, of 
the MSM error propagation operator E. The same is true 
for the vector version with respect to the error propagation 
matrix (iteration matrix) E= (I—V,C7'V3K)---d- 
V CI 'VIK). Convergence rate estimates of the form 


(Elle = WEllk < asm < 1 (47) 
will be presented in Section 3.3. Unfortunately, the MSM 
preconditioner 

cC=Kd-E)" (48) 
is not symmetric and, therefore, cannot be used in the PCG 
as a preconditioner. 

However, repeating the subspace corrections in Algo- 
rithm 5 in the reverse direction, we arrive at the so-called 
symmetric Multiplicative Schwarz Method (sMSM) that is 
characterized by the error propagation operator 
E=(1—B)---(— Py) — Pd — Ppd — P,a) 

U- P) (49) 


resp. the error propagation matrix 


E = (0 - VCI VIK) ---(-V,C;'ViK) 
x (L — V;C7'VIK) --- d — V CI'VIK) (50) 


resulting in a symmetric preconditioner (48). Mention that, 
in the exact version of the sMSM, the subspace correction 
in V, has to be carried out only once because (J — P;)(I — 
P,;) = (l — Py). 


3.2.3 Hybrid algorithms 


There are a lot of useful hybrid Schwarz algorithms cor- 
responding to various possibilities of the arrangement of 
the subspace correction in an additive and multiplicative 


manner. The algorithm can be completely defined by its 
iteration operator E resp. the iteration matrix E. 
For instance, the iteration operator 


E=0-È -u+ + PDU- Gn 


corresponds to the Hybrid Schwarz Method described by 
Algorithm 6 (function version only). The error propagation 
operator (51) resp. Algorithm 6 correspond to a symmetric 
preconditioner of the form (48). Note that the error propa- 
gation operator (51) as well as the corresponding iteration 
matrix are self-adjoint with respect to the corresponding 
energy inner products. 


3.2.4 Examples 


We recall that the Schwarz algorithm (preconditioner) is 
uniquely defined by the space splitting V = Dai Vj, the 
subspace bilinear forms a,(-, -) and the arrangement of the 
projection-like operations. 


Example 3 Nodal basis splitting. For j = 1, J and J = 
N, we define the one-dimensional (N; = dim V, = 1) sub- 
spaces 


V, = span{ġ;} = span®V, y (52) 


jal jr 
where M =e, is the jth unit vector e; = (Oaer 0; 1,0, 


...,0)T. Therefore, 


of V giving the so-called nodal basis splitting V = wy 


és u = 
VKV; =e} Ke, = Kj; (53) 


Taking this relation into account, we observe that the ASM 
preconditioner (36) in its exact version (C; = VTKV,) is 
nothing but the well-known Jacobi preconditioner (diagonal 
scaling) 


J 
C= Xe K7'e = D7! = Giag(K))* (54) 
jai 


and the exact ASM coincides with the classical (damped) 
Jacobi method. Furthermore, the exact MSM correspond- 
ing to our nodal basis splitting gives us the classical 
Gauss—Seidel method. Indeed, from the exact version of 
the MSM Algorithm 5 and from (53), we see that the jth 
subspace correction step in the nth iteration step 


ats) HGD y, v,witd/D 


= "HUD e; Ke,) eF (£ zs Ku” tG -DAI (55) 
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Algorithm 6. Hybrid Schwarz method corresponding to the error propagation operator (51). 


u? = Qu? e V given initial guess and 1 given iteration parameter 


for n = 0 step 1 until Convergence do 


{initialization} 
{begin iteration loop} 


{first multiplicative subspace correction in V,} 


wr eV, : (wh, v,) = (fm) au", n) Yu, eV, 


ul = u” 4 we! = u” + P (u — u”) 


{additive subspace corrections in remaining subspaces} 


for all j € {2,..., J} in parallel do 


wi eV: awit, v) = (f, v) —a(u™, v) Vu, € Vj 


end for 


2 a ont PW aR 18 ; 
we? = uhl ewe =u) t Eja Pu u) 


{second multiplicative subspace correction in V,} 


wh? eV: a (wi?, v) = (f, v) —a(u™?, v) Vo, €V; 


u”+! = u™? a wi? = une + P (u ae u™?) 


end for 


{end iteration loop} 


updates only the jth component of the iterate 


j-i 
muy 1 fp _ nHj~D/F} 


J 
=o. T j=1,2.., J J =N) (56 


i=j+1 


Formulas (56) exactly describe the Gauss—Seidel iteration 
procedure. In this sense, we look at the ASM and the MSM 
as the natural generalizations of the Jacobi method and 
the Gauss—Seidel method, respectively. In an analogous 
way, the symmetric Gauss—Seidel method corresponds to 
the exact sMSM. Similar to the SOR and the SSOR 
methods, we can also introduce overrelaxation parame- 
ters into MSM and sMSM aiming at the improvement of 
convergence (see Griebel and Oswald (1995) for related 
results). 


Therefore, the Jacobi iteration and the Gauss-Seidel met- 
hod are the classical prototypes of the ASM and the MSM, 
respectively. Further examples are given in Section 4.3 and 
correspond to multilevel splittings of the finite element 
space V. The most prominent one is the so-called BPX 
preconditioner. 


3.3 Spectral equivalence estimates and 
convergence analysis 


In this section, we present some convergence results for 
the Schwarz methods and some spectral equivalence results 
for the corresponding Schwarz preconditioners. The main 
condition for creating good Schwarz methods (precondi- 
tioners) consists in a stable splitting of the (FE) space V 
into subspaces {V,}. We first consider the simple case of 
splitting V into a direct sum of two subspaces V, and V}. 
This case is very important for the nonoverlapping domain 
decomposition methods studied in Section 5. Finally, we 
give some result for the general case of splitting V into J 
subspaces. 


3.3.1 The simple case: splitting into a direct sum of 
two subspaces 


Let us consider the simple case of splitting 
V=V,+V, and V, NV, = {0} (57) 


into the two (nontrivial) subspaces V} = span@V, and 
YV, = span®V,, and fet us define the cosine of the angle 
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between V, and V}: 


y=cos < (V,, V2) := sup 2M, V2) 
vie¥i\(0}, veV2\(0} Mlle [voila 


corresponding to the sharp constant y in the so-called 
strengthened Cauchy inequality: 


la (vy, %)1 < Ylvi lla lla Yvi E Vy, Yoz E Vz (59) 


The splitting (57) is called stable if and only if the constant 
y stays less than 1 for growing dimensions N = N, —> oo 
(h —> 0). 

The following lemma gives some useful relations for 
computing or estimating y. 


Lemma 1. The following relations are valid: 


cos £ (V,, V2) = cos £ (Vz, Vz) (60) 
a(v,, v. 
sup AM Uy) ae sup 
vi EV, \(O}, v2eV2\ {0} llv; lla l valla vi EV, \ {0}, v26 V2\{0} 
2a(v], v2) 
2 7 (61) 
lvla + ivl 


sü (v4, v2) 
p To ae u 
vrevi NO}, 2€V2\(0} lilla Halla v ER! \(0},v2ER"2\ (0) 
x (NEKV2) VV)! (VEKV))¥1, v2) 
(VIKV,¥,, v) 


(62) 


The proofs of the relations (60), (61), and (62) are ele- 
mentary and can be found in Bjørstad and Mandel (1991), 
Axelsson and Vassilevskii (1989), and Haase, Langer and 
Meyer (1991) respectively. Relation (62) means that y 
coincides with the maximal eigenvalue of the generalized 
eigenvalue problem 


(VIKV,)(V3KV,)~!(VEKV,)v, = X(VIKV,)v, (63) 


The error iteration schemes z”+! = Ez" corresponding to 
the exact ASM (t = 1) and the exact MSM are illustrated 
at the left and right sides of Figure 2, respectively. This 
figure shows that the exact MSM converges twice as fast 
as the corresponding ASM. More precisely, the following 
theorem holds. 


Theorem 1. Assume the splitting (57) with y € {0,1) 
defined by (58). Then the exact MSM converges in the energy 
norm with the rate y’, that is, 


Jew, <P fe— ull n=12... (64) 


provided that u’ e Y is chosen such that z! = d- P,)z° E€ 
yi (initial orthoprojection step onto V,). The convergence 
rate of the corresponding exact ASM with t = 1, = 1 is 
only y, that is, Z 


a — u"! j, < yhu — u” la 2=0,1,2,... (65) 


In this case, the ASM preconditioner C has the form 


-r { VIKV 0 
C=vt mae —1 
~ ( 0 T = (66) 


Ossie 
z=u-u 
pL 

vi 


Yy=c08<(V4, Vo) 


llu- u" la Syllu- "la 


Figure 2. ASM and MSM corresponding to the splitting (57). 


flu- u"a £Y? fu- ulla 


with the regular N x N basis transformation matrix V = 
(V1 V3), and satisfies the spectral equivalence inequalities 
(39) with the sharp spectral equivalence constants y = 
l—yandy=1+y. 7 


The following straightforward spectral equivalence esti- 
mate for the inexact ASM preconditioner can be very useful 
in practice. 


Corollary 1. Let us assume that there are SPD subspace 
preconditioners C, and C, such that the spectral equiva- 
lence inequalities 


y,€; < VKV; <70 (67) 


hold with positive Y and Y; for j = 1,2. Then the inexact 
ASM preconditioner 


C 0 
ayT 1 1 
C=V (i ale (68) 
is spectrally equivalent to K in the sense of the spectral 
equivalence inequalities (39) with the spectral equivalence 
constants Yy = min{y,, y,}0 — y) and Y=min{y,, Y2} 
(1+ y). 


Corollary 1 implies that the inexact ASM converges for 
z € (0, 2/7) with the rate given by (38). The inexact sMSM 
version is discussed in Haase and Langer (1992). Further 
results for the inexact ASM, MSM, and sMSM versions 
follow from the general case discussed in the next section. 
We additionally refer to Aronszjan (1950) and Bjgrstad and 
Mandel (1991) for the special case of the splitting into two 
subspaces, including the case where V, V2 is nontrivial 
like in the classical alternating Schwarz method introduced 
in Section 3. 


3.3.2 The general case: splitting into J subspaces 


Let us first consider the inexact ASM and the inexact ASM 
preconditioner introduced in Section 3.2.1. It was pointed 
out in this section that the convergence analysis (cf. rate 
estimate (38)) aims at the spectral estimates (39)-(41) of 
the ASM preconditioned matrix C~'K, or equivalently, of 
the ASM operator P. Defining the so-called splitting norm 


J 
W= inf SY aj(v,,,) (69) 


v=} jai j=l 


where the infimum is taken over all possible splittings v = 
Dja v; of v with v; € Vj, we obtain the following exact 
representation of the minimal and maximal eigenvalues of 
CK resp. P. 
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Theorem 2. 
kA , a(v, v) 
(CK =A. = To 
mia (CK) = Mmi (P) = min a 
and a(v, v) 
ERI. EEA A 
hmar (CK) = Ana (P) = ma pa OO 


Proof. follows from (39)—(41) and the observation that 
[lvl] = a(P7!v, v) (see e.g. Bjørstad and Mandel, 1991; 
Xu, 1992; and Oswald, 1994), Theorem 2 is closely related 
to the so-called fictitious space Iemma that has been used 
in the Russian literature since the eighties (Matsokin and 
Nepomnyaschikh, 1985; Nepomnyaschikh, 1990) (see also 
Oswald (1994) and Griebel and Oswald (1995) for this 
relation). ml 


The following two corollaries immediately follow from 
Theorem 2 and provide powerful tools for estimating (cal- 
culating) Amin (P) and Nmax (P)- 


Corollary 2 (Lions’ lemma, 1988). Assume that there 
exists a positive constant c, such that for all v € Y there 
exists at least one splitting v = Sao such that the 
inequality 


ck 
auj v) < e7a, v) (71) 
j=l 
holds. Then 
hmin (C7!) = Amin (P) = 1/0} (72) 


Proof. See also the original paper by Lions (1988). o 


Corollary 3 (subspace interaction lemma). Let us define 
the J x J subspace interaction matrix T = (¥j;);,j21,..,4 
with the coefficients y;; stemming from the supremum (cf. 
also (62)) 


su eu (73) 
Yy = p — m 
ij vjEVi, EV; /a; (Vi v;) [a;(v;, v) 
or the generalized strengthened Cauchy inequalities 

ja(u;, v;)I 

S Yijy Gp ¥;)4/ 4; (Yj, v) Vo; € Vp Vv; € V; (74) 
Then 

Ama (C'E) = Anax (P) < pT) (15) 
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where p(T) denotes the spectral radius of the subspace 
interaction matrix T. 


Proof. See Dryja and Widlund (1995). (mi 


The convergence analysis of inexact multiplicative or 
hybrid versions of the Schwarz methods is more compli- 
cated. First of all, we need the so-called subspace contrac- 
tion condition stating that the inequalities 


a,(P,v;,;) <@a(vj,¥)) YEV, Vi=12...,7 

(76) 
hold for some constant w € (0, 2). Indeed, condition (76) 
ensures the contraction of the operator 7 - P, on the 
subspace V,. We present here only two results concerning 
the inexact MSM and the inexact sMSM introduced in 
Section 3.2.2. For more results, we refer the reader to 
special papers on this topic, for example, Xu (1992), Dryja 
and Widlund (1995), and Griebel and Oswald (1995). 

The sMSM produces a SPD preconditioner C = K(I — 
E)! that can be used in the PCG method for solving 
our FE system (23). The following theorem again provides 
bounds for the minimal and maximal eigenvalues of the 
preconditioned matrix C~!K. 


Theorem 3. Let us assume that the space splitting is 
stable in the sense of Corollaries 2-3 and that the subspace 
contraction condition (76) holds for some constant w € 
G, 2). Then the spectral estimates 


2-0 
ene ao. 1 ib) Ss 
Amin (C7 K) min ( 2 Pe 
and 


Nmax(C 1K) = mall — E) < 1 (77) 


hold with c; and p(T) defined in Corollaries 2 and 3 respec- 
tively. 


Proof. See Smith, Bjørstad and Gropp (1996) for a slightly 
more general case, or the original paper by Dryja and 
Widlund (1995). 


The following theorem gives an exact representation of 
the convergence rate ||Ell, of the MSM in the energy 
norm, where E again denotes the MSM error propagation 
(iteration) operator defined by (46): 


Theorem 4. Let us again assume that the subspace con- 
traction condition (76) holds for some constant œw € (0, 2). 
Then the energy norm of the MSM error propagation opera- 
tor E resp. of the MSM iteration matrix E can be represented 
in the form 


Ell, = WElle =4 = 75 <! (18) 


where 


J 
c= sup inf YiaPP, Pw, w)<œ (79) 
oll v=) y fa 


l 


y 


with w; = Eju - È-'v, and P, = Ps + P, — PrP, = 
2P, — P? (P, = P»). 

Proof. See Xu and Zikatanov (2002) for the more general 
case of infinite-dimensional Hilbert spaces and of nonsym- 
metric, but elliptic bilinear forms. mi 


In the case of the exact MSM, the representation can 
be simplified because P, = P, = P* = P? is an orthopro- 
jection with respect to the energy inner product a(.,-). 
More precisely, the constant c in Theorem 4 can directly 
be rewritten in the form 


J-i J 
c= sp inf SIP, D> vla<œ (80) 


J 
Iola=1 v=} iy jor JET 


Moreover, Theorem 4 immediately yields the convergence 
rate estimate 


lEsmsmila = l| EMsmEmsmlla < llEMsmlla llEmsmlla 
= Evsmlla = 9? <1 © 8) 


for sMSM, where Emsm and E,msm denote the error prop- 
agation operators corresponding to the MSM and sMSM, 
respectively. Estimate (81) implies the spectral equivalence 
inequalities 


GQ—-@)C<K<1C (82) 


for the sMSM preconditioner C = K(I — Egygy)7!. The 
abstract representations and estimates given above are very 
essential for obtaining spectral equivalence or convergence 
rate estimates for specific subspace correction methods in 
concrete applications. 


4 OVERLAPPING DOMAIN 
DECOMPOSITION METHODS 


4.1 Basic construction principles and algorithms 
with generous overlap 


As explained in Sections 2 and 3, the overlapping DD 
methods have a long history and can be completely 
treated within the framework of the Schwarz theory. More 
precisely, 


e the splitting of (FE) space V = SG V; 
e the subspace bilinear forms a,(-, +), and 


e the arrangement of the projection-like operations 
(additive, multiplicative, hybrid) 


completely define the Schwarz method (preconditioner), the 
complete analysis of which is also covered by the Schwarz 
theory presented in Section 3.3. 

Without loss of generality, we restrict ourselves to the 
(exact) additive version (preconditioner) that is the most 
important one in parallel computing. Then the analysis can 
mainly be reduced to the verification of the conditions 
formulated in Corollaries 2 and 3, namely, 


e the verification of the stability (71) of the space split- 
ting and 

e the calculation of the subspace interaction measure 
ef). 


For definiteness, we consider the heat conduction prob- 
lem described by Example 1 as a model problem and 
assume a moderate and smooth behavior of the heat conduc- 
tion coefficients. In the overlapping DD method, the split- 
ting V = Ð; V; of a (FE) space V = V,,(&) C HA9) 
corresponds to an overlapping DD Q = oe Q; of the 
computational domain 2, where the subspaces usually have 
the form V, = VN H(Q,), that is, the subspace problems 
are local Dirichlet FE problems. There are several methods 
for constructing overlapping domain decompositions. Let 
us mention at first two simple techniques that both start 
from a coarse shape-regular conform triangulation Ty of 
Q= Uai Tq ; and proceed with its refinement, resulting 
in a shape-regular conform fine grid discretization 7, of 
Q= U- 1 Urez. tir = Uren, Ta, For definiteness, we 
assume that these triangulations are provided by triangles 
(d = 2) or tetrahedra (d = 3), and, for simplicity, we also 
assume that linear elements are used for generating the 
FE spaces. The parameters H and h stand for the typical 
sizes of the coarse and the fine quasiuniform triangulations 
respectively. Now we associate with each coarse grid ver- 
tex x (j = 1,2,..., J) some subdomain Q; that is built 
by all coarse simplices containing x“ as a vertex. This 
gives our first overlapping domain decomposition (ODD1) 
of Q, where the overlap ò = O(H). Another one is given 
by associating with each coarse grid simplex Ty j some 
subdomain ©, that is built by this simplex and all sim- 
plices touching this simplex at least in one vertex, where 
j=1,2,...,J with J = J. This again gives us an over- 
lapping domain decomposition (ODD2) of 2, where the 
overlap § = O(H). Several generalizations of these tech- 
niques are feasible. For instance, one can first build a 
nonoverlapping domain decomposition with subdomains © j 
consisting of one or several coarse grid elements and then 
extend them by adding some layers of fine grid elements 
around these subdomains, giving the subdomains ©, of an 
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overlapping domain decomposition (ODD(8)), where 8 is 
the thickness of the layers, that is, the overlap. Thus, ODD2 
is a special ODD(H) method. 

Let C be the (exact) additive Schwarz preconditioner 
(cf. (36)) 


Jj J 

—1 T -iyT -1 

C =) V,(W7KV,)'V} =) VKV? (83) 
jel j=l 


corresponding to the space splitting 


V=V, => V, V; = V, N HQ) = spanev, 


(84) 
that is based on one of the overlapping domain decomposi- 
tions Wey Q; of Q with O(H) overlap as described above, 
where the N x N; matrix V, picks exactly those basis func- 
tions from the fine grid nodal basis ©, which belong to the 
inner nodes in 2,. The SPD N, x N, matrix K; is nothing 
but the stiffness matrix belonging to the local FE Dirichlet 
problem in &;. Using the general Schwarz theory, we can 
prove that 


K(C7!K) = O(H~) (85) 


that is, owing to the generous O(H) overlap, the relative 
spectral condition number does not depend on the fine grid 
discretization parameter A in a bad way, but on the domain 
decomposition parameter H. This is totally unacceptable 
for the use of this preconditioner in a massively parallel 
solver environment. The bad dependence on H is due to 
the absence of some coarse grid solver managing the global 
information transport that is essential for elliptic problems 
(see Widlund (1988) and Smith, Bjgrstad and Gropp (1996) 
for a more detailed discussion of this issue in connection 
with DD methods). 

There are a lot of possibilities to include such mecha- 
nisms for global information exchange into the Schwarz 
preconditioner. For the overlapping domain decomposi- 
tions presented above, the natural way certainly consists in 
adding the coarse grid FE space Vy = Vy = span®Vy C 
V C Hi(Q) to the splitting (84), that is, 


th J 
V=V,=V+)>_V,=)_Y, (86) 
j=l j=0 


Now, the corresponding two-level (coarse level and fine 
level) overlapping ASM preconditioner 


J 
HE as -iyT 
c =J vgv (87) 


j=0 
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gives an optimal relative spectral condition number 
estimate. 


Theorem 5. The exact two-level ASM preconditioner 
(87) based on an overlapping domain decomposition 
with a uniform overlap width O(H) provides an optimal 
preconditioner in the sense that x(C~!K) < c, where the 
Positive constant c does not depend on h, H, and J. 


Proof. was given by Dryja and Widlund (1989) on the basis 
of the Schwarz theory and some technical lemmas (partition 
of unity, stability of L,-projection in H’) (see also Smith, 
Bjørstad and Gropp, 1996). o 


The theorem remains obviously true for the inexact 
version where the local stiffness matrices K; are replaced 
by suitable spectrally equivalent preconditioners C, for 
j =1,2,...,J. The analysis of multiplicative versions 
follows the same line of the general Schwarz theory (see 
e.g. Bramble et al., 1991; Xu, 1992; and Xu and Zikatanov, 
2002). The results can be extended to coarse grid spaces 
Vz, which are not subspaces of the FE fine grid space 
V = V, 

However, there are two drawbacks of two-level ASM 
preconditioners with a generous overlap. The first problem 
is connected with jumps in the coefficients. In contrast 
to the nonoverlapping DD methods (cf. Section 5), the 
influence of the jumps in the coefficients with large jumps 
is still not completely understood. The second problem is 
connected with the influence of the overlap width 8 on 
«(C7 K). It is clear that the larger the overlap, the more 
expensive are the local problems that we have to solve. 
‘The computational overhead becomes significant when h 
becomes small with respect to H. We discuss this problem 
in the next section. 


4.2 Domain decomposition algorithms with a 
small overlap 


In the case of a small overlap 8, Dryja and Widlund (1994) 
proved the following theorem. 


Theorem 6. The exact two-level ASM preconditioner (87) 
based on an overlapping domain decomposition with a 
uniform overlap width O(8) gives the estimate 


(COR) < a(t + 7) (88) 


of the relative spectral condition number K(C7!K), where 
the positive constant © does not depend on h, H, J, and 8. 


Brenner (2000) showed that this result is sharp in the case 
of minimal overlap (è = h), that is, there exists a positive 


constant c that is independent of k, H, and J such that 
K(C-!K) > c(H/h). In the same paper, she proved that 
K(C~!K) = O((H/h)?) in the case of fourth-order elliptic 
boundary value problems, 

Therefore, a small overlap really affects the precondition- 
ing effect of the two-level ASM preconditioner (87) in a 
very bad way. On the other hand, the O(#) overlap means 
that additional O((H/h)*) unknowns are added to the local 
problems in contrast to O((H/h)?~!) unknowns in the 
case of an O(h) overlap. Bank et al. (2002) have recently 
proposed a two-level hierarchical overlapping ASM pre- 
conditioner that adds only O((H/h)4—) unknowns to the 
local problems as in the case of small overlap, and that 
results in a uniformly bounded relative condition number 
estimate, as in the casc of Theorem 6. 


4.3 Multilevel versions 


The two-level Schwarz methods, described above, use a 
fine (h) and a coarse (H) mesh capturing the local (high- 
frequency) and the global (low-frequency) parts in the 
solution (iteration error) respectively. This approach is sat- 
isfactory if efficient local and global solvers (precondition- 
ers) are available. However, if the global problem is large, 
that is, H is relatively small, then we can again apply a 
two-level algorithm to the coarse grid problem using again 
some coarser grid. The recursive application to the coarse 
grid problems results in a multilevel ASM preconditioner. 
To be more precise, we assume that the coarse grid (triangu- 
lation) 7 = Ty is refined L times giving the finer and finer 
triangulations Z,,...,%,_,, and T, = T,. For each level 
l= 1,2,...,Z of the triangulations, with the exception of 
the coarsest level 1 = 0, we eer tint some overlapping 
domain decomposition Q = JŽ J;., 2; and connect with 
this multilevel overlapping domain decomposition the muj- 
tilevel splitting of the FE space 


L 
V= n=w+ LEW, (89) 
1 j=l 


in the same way as above, where the subspaces Vij 
span®V, , can again be generated by using the N x Mey 
basis transformation matrices V, „z Now the corresponding 
(exact) multilevel overlapping ASM preconditioner can be 
written in the form 


L h 
C7! = VK Vo + 9O YOV KV; (90) 
l=] j=l 


Let us consider one extreme case where the subdomains 
Q, ; are simply the supports of the nodal basis functions 


b,j belonging to the node x; in the /-level triangula- 
tion 7, that is, now the subspaces V, , = span(>, ;) = 
span@y, ; are one-dimensional, where ® = ®, = ®, = 
[bz j] denotes the fine grid basis and the N x 1 matrix 
V, ; provides the representation of the basis function 
Qj in the fine grid basis. We mention that this over- 
lapping DD is closely related to ODDi, but now the 
elements around the node x,, are taken from the trian- 
gulation 7; and not J_,. The exact multilevel overlap- 
ping ASM preconditioner corresponding to this overlapping 
domain decomposition is called multilevel diagonal scal- 
ing (MDS) preconditioner and was introduced by Zhang 
(1992). Since for second-order elliptic problems the 1 x 1 
matrices K, ; obviously behave like 4} °, the inexact ver- 
sion of the MDS preconditioner can be written in the 
form 


-1 = V Kz V + ap a pthe td oO 
l=1 


The inexact multilevel overlapping ASM preconditioner 
(91) was first proposed by Bramble, Pasciak and Xu (1990) 
and is nowadays known as BPX preconditioner. 


Theorem 7. The MDS and the BPX preconditioners are 
optimal preconditioners in the sense that there is some pos- 
itive constant c that does not depend on h and L such 
that the relative spectral condition number K(C1K Lee 
and the arithmetical cost ops(C~'d) for the precondition- 
ing operation is proportional to the number of unknowns 
N, = O(h’) on the finest grid. 


Proof. The original proof by Bramble, Pasciak and Xu 
(1990) provided weaker (nonoptimal) bounds depending on 
the number of refinement levels L = O(log(1 + (H/h))) 
(see also Zhang (1992) for the MDS preconditioner). The 
optimality of the BPX preconditioner was first proved by 
Oswald (1992) using Besov space techniques (see also 
Oswald, 1994). Oo 


Multiplicative and hybrid versions of these multilevel 
Schwarz methods are closely related to multigrid methods, 
which are discussed in Chapter 20, this Volume; see also 
Bramble and Zhang (2000) for this relation. 


5 NONOVERLAPPING DOMAIN 
DECOMPOSITION METHODS 
5.1 Iterative substructuring methods 


For definiteness, we consider the heat conduction problem 
described by Example 1 as model problem. In practice, the 


Domain Decomposition Methods and Preconditioning 633 


heat conduction coefficient a(-) typically has jumps due to 
different materials. As in Section 2, we assume that the 
computational domain 


J 
a=UR, 02) 


i 


is decomposed into J nonoverlapping subdomains in such 
a way that the coefficient jumps are along with the bound- 
ary of the subdomains. For simplicity, we assume that in 
each subdomain 2, the coefficient a(-) has the constant 
positive value ay. “Further, we assume that the domain 
decomposition is quasiregular in the sense that the sub- 
domains are images of some reference domain, or a few 
reference domains, by a quasitegular mapping with the 
scaling H. Therefore, H can be viewed as typical sub- 
domain diameter such that J = O(H~?). As described in 
Section 2, we provide every subdomain with a triangulation 
T; such that the pie amie T, of the total computational 
doain Q= Ük z Urez T z, is conform and quasiregular in 
the sense that there is a typical element diameter h such 
that N = N, = O(h~4). Thus, the number of the internal 
subdomain unknowns N; behaves like O((H [WÒ The 
FE discretization with the arrangement (12) of the FE basis 
® leads to the block structure (13) of the FE equations. The 
stiffness matrix K and the load vector f can obviously be 
represented in the form 


J J 
E 
K=J ATKA, and f=) Aji (93) 


j=l j=1 


where the N, x N Boolean subdomain connectivity matri- 
ces A; are mapping some vector u € R” of all nodal values 
onto the vector u; = Aju € RÙ of the subdomain nodal 
values. The N, x N, subdomain stiffness matrices K, and 
the subdomän iod. vectors f, = A; feR™ can be ome 
tured in the same way as we ; have. structured K and f in 
(13), that is, 


(Ke, Ko, ZA (7) 94 

ae (xe g) $ f; ea 

The matrices K, correspond to the local homogeneous 

Dirichlet problems, whereas the matrices K; arise from 

the FE discretization of the local Neumann problems, at 

least, for the subdomains with 82, N8Q =. For our 

model problem, these matrices are singular, where the 

kernel (null space) ker(K;) = span(1,) is spanned by 1, = 
(,1,...,) E R”. 
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The block Gaussian elimination of the internal subdo- 
main unknowns u; reduces the solution of the FE equation 
(13) to the solution of the Schur-complement problem 


Scio = gc (95) 


that was explicitly formed and directly solved in the classi- 
cal substructuring Algorithm 3. The Schur complement Sç 
and the right-hand side gç can be assembled from the local 
Schur complements Sc, and the local right-hand sides gc, 
in the same way as K ‘and f were assembled in (93) from 
K; and g; respectively, that is, 


J J 
Se = JL AGScAc, and ge=} Agg, (96) 
j=l j=l 


As mentioned in Section 2, the iterative solution of (95) 
avoids the very expensive forming of the Schur complement 
Sc. Solving (95) by the PCG methods lead to our first 
iterative substructuring method called Schur-complement 
CG. In each iteration step of the Schur complement CG, 
we need one matrix-by-vector multiplication of the form 


J 


= Dae cSqAc ve 
J 
=> AG, Ko, ~ KoE; "Ky c)Acve ON 


requiring the direct solution of J systems (local Dirichlet 
problems) 


K,WE=KygAgv j=l.. J (98) 


which can be done completely in parallel. Moreover, the 
factorization of the matrices K, in a preprocessing step 
and the use of sparse direct techniques can make this mul- 
tiplication operation very efficient (see e.g. the classical 
monograph by George and Liu (1981) and more recent 
papers by Demmel et al. (1999) and Gupta (2002)). Nev- 
ertheless, for real large-scale problems, this operation is a 
bottleneck of the Schur complement CG. The use of inex- 
act (iterative) solvers for the local Dirichlet problems (98) 
is dangerous. We discuss the naive use of inexact solvers 
in Section 5.3. 

The forming of the Schur complement is some kind of 
a preconditioning operation. Indeed, the spectral condition 
number of the Schur complement «K(S,)= 
dmmax(Sc)/Mnin Sc) = O(H—'h7}) is much better than the 
spectral condition number of the original stiffness matrix 
«(K) = O(h?) since h << H (see e.g. Brenner, 1999). 


However, «(S,) still depends on the DD parameter H and 
the global discretization parameter h as well as on the 
coefficient jumps in a bad way. Thus, we need a SPD Schur- 
complement preconditioner Cç removing the influence of 
all these parameters in such a way that K(CZ!Sc) does not 
depend too much on these parameters under the restriction 
that the preconditioning operation w¢ = Co'd2, mapping 
the defect dż into preconditioned defect wẹ, is sufficiently 
cheap. Since the Schur-complement preconditioner is one 
of the most important ingredients of many iterative sub- 
structuring methods, we discuss this topic in more detail in 
the next section. 

In this chapter, we only consider iterative DD methods 
for h-versions of the FEM. The hp methods, by which 
we here mean both finite element and spectral element 
methods, are gaining growing attention due to the ability 
to attain exponential convergence even for problems with 
singularities. Although this noticeable advantage is often 
damaged by the high cost of the setup procedure caused 
by the high fill-in of the stiffness matrices and complex 
algorithms for calculating their entries, the computational 
cost grows with p algebraically at the worst. Therefore, 
there is a strong incentive to achieve a better performance 
by means of smart Ap algorithms. The literature on hp 
methods is very vast. We refer the reader to Chapter 5, 
Chapter 6 of this Volume and Chapter 3, Volume 3 for 
more information. In spite of the high interest, the toolkit 
of fast solvers for the systems arising from Ap discretiza- 
tions is much smaller than that for the A-version.'In the last 
decade, the major progress in this area has been achieved 
on the basis of DD approaches. The formation of the basic 
features of hp DD algorithms is due to the contributions by 
Babuska, Craig, Mandel and Pitkäranta (1991), Pavarino 
(1994), Ivanov and Korneev (1995), Ainsworth (1996), 
Ivanov and Korneev (1996), Pavarino and Widlund (1996), 
Widlund (1996), Ainsworth and Senior (1997), Casarin 
(1997), Oden, Patra and Feng (1997), and Korneev and 
Jensen (1997). Let us mention that the spectrally equivalent 
finite-difference-like preconditioners for the local problems 
were suggested and studied by Orzag (1980), Bernardi, 
Dauge and Maday (1992), and Casarin (1997) for spec- 
tral discretizations, and by Ivanov and Korneev (1996) 
and Korneev and Jensen (1999) for hierarchical discretiza- 
tions. Later studies paid more attention to a more elaborate 
design of all components of DD algorithms, that is, fast 
solvers for the local Dirichlet problems, efficient prolon- 
gations from the edges in 2D and from the faces in 3D 
cases, respectively, edge and face Schur-complement pre- 
conditioners, solvers for the wire-basket problem, and so 
on. In this relation, we refer to Korneev (2001, 2002a,b), 
Beuchler (2002), Korneev etal. (2002), and Korneev, 
Langer and Xanthis (2003). These studies resulted in fast 


Dirichlet DD preconditioners for second-order elliptic 
equations. Some useful properties have been added to hp 
DD methods by the use of nonconforming discretizations 
and, in particular, by the mortar and FETI methods (see e.g. 
Bernardi, Maday and Sacchi-Landriani, 1989; Bernardi and 
Maday and Patera, 1993; Ben Belgacem and Maday, 1999; 
Bernardi, Maday and Sacchi-Landriani, 1989). Note that the 
components of the fast solvers developed for the Dirichlet 
DD preconditioners may be applied to these discretizations 
as well. 


5.2 Schur-complement preconditioners 


In this section, we look for SPD Schur-complement pre- 
conditioners C, satisfying the following conditions: 


1. Spectral equivalence condition: the spectral equiva- 
lence inequalities 


YcCc £ Sc < YcCe (99) 


should hold with positive spectral equivalence con- 
stants Yo and Yç such that «(Col So) <= Yc/Yc does 
not, or only weakly depend on h, H, and the coef- 
ficient jumps. The latter property is sometimes also 
called robustness with respect to coefficient jumps. 

2. Efficiency condition: the number of arithmetical opera- 
tions ops(Cz'd%.) needed for the preconditioning oper- 
ation should be of the order O(N)... O(N), or at 
least should not disturb the overall complexity of the 
algorithm too much. 

3. Parallelizability condition: the preconditioning opera- 
tion Cod. should not disturb the numerical and paral- 
lel efficiency of the total algorithm too much. However, 
we should be aware that in many Schur-complement 
preconditioners some coarse grid solver managing the 
global information transport is hidden. Thus, the coarse 
grid solver requires global communication and is some 
bottleneck in the parallelization. 


In the literature, there are several basic proposals for Schur- 
complement preconditioners. Many of them are based on 
the fact that the Schur-complement energy (Sclc, uc) is 
spectrally equivalent to the broken-weighted H!/?-norm 


J 
luelinzro = a luc, Frag) 


g luc, (y) — uc (x)? 
= ; 4 ds, ds 1 
Xa f, $ tap (100) 
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with T; = ðQ, that is, there exist positive constants èc 
and ôç, which are independent of h, H, and the coefficient 
jumps, such that 


2 
èc Wola. < Seu, ue) 


< $e Wltclluyr.y Yuc <> uc ER” (101) 
Evidently, 


ja = (Scuç, tc) 


= = (Se, Uo) = Èv, a IVv; l?a) (102) 
j=l € 


where the subdomain Schur complements Sc, arise from the 
case œ; = 1 and the infimum is taken over all FE functions 
vye v, = Vig, living in D and coinciding with Uc, on 
É, Therefore, the EA (101) is the same as the 
equivalence of the infimum in (102) to the H"? (T) — norm 
in (100) and requires the trace and lifting (prolongation) 
theorems for functions from the FE space. In simple cases, 
the left inequality in (101) is an obvious consequence of 
the trace theorem for functions from H 1(Q,). The right 
inequality in (101) requires some special proof, which, for 
example, may be found in Nepomnyaschikh (1991b). 

Let us first describe Dryja’s classical Schur-complement 
preconditioner that is just applicable to the L-shaped 
domain sketched in Figure 1. Here, the interface I'o con- 
sists only of one straight piece with N (inner) nodal points. 
For this simple but characteristic example, Dryja (1982) 
proposed the preconditioner 


Co = BY? 
2 | We 


= ty. a, Fig =F} Al? Fo (103) 


that is nothing but the square root of the scaled 
discretized 1D Laplacian Bọ along the interface under 
homogeneous Dirichlet boundary conditions at the end 
points of the interface. Dryja (1982) proved the 
spectral equivalence inequalities (99) with h-independent 
spectral equivalence constants y, and Yç (see also 
theorem 1 in Dryja, 1984). It is well known that 
Bo has the eigenvalues hy = 4 sin? (kr /(2(Nc + 1))) 
and the corresponding orthonormal eigenvectors vo, = 
[V(2/(Ne + 1) sinGeal/(Ne + D) yar. ver Where k is 
running from 1 to Nc. Therefore, in (103), AY” = 
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diag(n}/?) and Fo = [Vc <--> Vc,Ncl- Since the Fourier 
matrix Fç is orthogonal, that is, Fo! =F, the 
preconditioning equation Cow, = dç can be solved in the 
following three steps: 


Fourier analysis: Xo = Fede 


Diagonal scaling: = yo = A7 ’xe 


Fourier synthesis: Wo = Fly, 

The complexity of this preconditioning operation is domi- 
nated by the Fourier analysis and Fourier synthesis. Using 
the fast Fourier transformation (FFT), the total complexity 
of the preconditioning operation is of the order No In(N¢). 
Dryja’s preconditioner was improved by Golub and May- 
ers (1984), Bjørstad and Widlund (1986), Chan (1987), and 
others. 

The construction of good Schur-complement precondi- 
tioners in the general case of many nonoverlapping sub- 
domains is more involved. In the two-dimensional case, 
Bramble, Pasciak and Schatz (1986) proposed an ASM pre- 
conditioner of the form 


Col =} VCE VE +} VC vp (09 
E y 


in which the vertices are separated from the edges of the 
subdomains arising from some coarse grid domain decom- 
position. Here, V$ takes the edge nodal basis functions 
(unknowns) belonging to the edge E from the nodal basis 
functions (unknowns), V}, transforms the nodal basis func- 
tions into the coarse grid (hierarchical) vertex basis func- 
tions, C, is an edge Schur-complement preconditioner as 
discussed above (Dryja’s type), and C, denotes a coarse 
grid preconditioner, for example, Cy can be the coarse 
grid stiffness matrix. The latter one manages the global 
information exchange. Bramble, Pasciak and Schatz (1986) 
proved that the relative condition number KE So grows 
at most like (1+In(H/h))? as h tends to zero. More- 
over, the right averaging of the coefficients of adjacent 
subdomains makes the BPS preconditioner, how the Schur- 
complement preconditioner (104) is nowadays called, quite 
robust with respect to coefficient jumps. In a series of 
papers, Bramble, Pasciak and Schatz (1986, 1987, 1988, 
1989) generalized these ideas in many directions, including 
Schur-complement preconditioners for the 3D case. 
Another type of Schur-complement preconditioners relies 
on the transformation of the nodal basis to a multilevel basis 
(generating system). This was very successful in construct- 
ing preconditioners for the finite element stiffness matrix K 
(the hierarchical preconditioner of Yserantant and the BPX 
preconditioner), It is easy to see that the same transforma- 
tions restricted to the coupling boundary (interface) nodes 


result in Schur-complement preconditioners possessing at 
least the same quality as the corresponding precondition- 
ers for K. Smith and Widlund (1990) and Haase, Langer 
and Meyer (1991) proposed hierarchical Schur-complement 
preconditioners, which asymptotically behave like the BPS 
preconditioner in the 2D case. The BPX Schur-complement 
preconditioner was introduced by Tong, Chan and Kuo 
(1991). It is asymptotically optimal, but is sensitive to coef- 
ficient jumps. 

There are a lot of other proposals for constructing Schur- 
complement preconditioners. Let us here mention only the 
wire-basket-based Schur-complement preconditioners intro- 
duced by Dryja, Smith and Widlund (1994), the probing 
technique proposed by Chan and Mathew (1994) (see also 
Keyes and Gropp, 1987), and the techniques borrowed 
from the boundary element method (see e.g. Carstensen, 
Kuhn and Langer, 1998; Haase er al., 1997; and Steinbach, 
2003). 

Finally, we refer to the Neumann—Dirichlet and to 
the Neumann—Neumann preconditioners, which are spe- 
cial Schur-complement preconditioners approved to be very 
robust in practical applications. The Neumann—Neumann 
preconditioners are discussed in Section 5.4 in detail. 


5.3 Inexact subdomain solvers 


5.3.1 Effects of inexact subdomain solvers 


Let us consider the discrete harmonic (a(., -)-harmonic) 
splitting 


V=V,=VEeV, (105) 
of the FE space V into the discrete harmonic space 
Vo = P*Vip, = {u € V : atu, v) = 0 Ww e Vj} 
= span¥é = span V% (106) 


and the interior subdomain (bubble) space V, = span®V, 
=V; ®...@V,,, with the basis transformation matrices 


I 
Vix ( Ic ) 
e -K; Kic NxNe 


2 be ) and V= ea a07) 
IC / NxNc IJ NxN; 


where Wt = V% is called the discrete harmonic basis 
and P* is the discrete harmonic extension (prolongation) 
Operator mapping a FE function uç living on Ic to some 
FE function P*uç that coincides with uç on Tọ and is 
discrete harmonic in all subdomains 2). Owing to the 


construction of Vi, the splitting is orthogonal with respect 
to the energy inner product a(-,-). Therefore, the exact 
ASM preconditioner 


AL Kc;K7' Sc 0 Io gi = 
c= (i I, 5 K ) (x ke I i 
(108) 
must coincide with K. This factorization of K is used in 
many applications. Replacing Sç by a Schur complement 
preconditioner Cç (cf. Section 5.3), we arrive at a partially 
inexact ASM preconditioner that corresponds to the Schur 


> complement iteration methods. Replacing additionally K, 


by some preconditioner C, gives us the full inexact ASM 
preconditioner. Owing to Corollary 1, both inexact ASM 
preconditioners are covered by the general theory with y = 
cos £ (VG, V;) = 0. The naive replacement of Kī‘ in the 
left and right factors of the factorization (108) is dangerous 
because it changes the angle between the subspaces. For 
instance, the ‘inversion’ of K, by one multigrid cycle, that 
is, replacement of K;! by (i, — E)K?’ (s = 1), will, in 
general, not result in a stable splitting even if the multigrid 
convergence rate, that is, the energy norm of the multigrid 
iteration operator |\E;||x,, is less than 1 independently of 
the local discretization parameter H/h (as usual). One 
needs at least s = O(In(H/h)) multigrid cycles to get a 
stable extension (see e.g. Haase, Langer and Meyer, 1991). 
The next section shows that a stable splitting requires 
a careful choice of the extension operator P resp. Pc 
replacing the discrete harmonic extension operator P* resp. 
Pic. 


5.3.2 The bounded extension splitting 


Let us now consider the bounded extension splitting 
V=V,=Ve+V, (109) 


of the FE space V in the direct sum of the former subdo- 
main space V, that remains unchanged and the bounded 
extension space 


Vo = PVip, = {Pu € V : uc = ulr, € Vir, given} 


= spanY, = spand Ý e (110) 


with the basis transformation matrices 


à I 
vo=(2° J : Ry, > Ry (111) 
Pic NxNe 
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We assume that there exists some constant cz, > 1 indepen- 
dent of our bad parameters such that 


Ig)" 


Together with the minimal energy property (102), inequality 
(108) gives the following spectral equivalence relations 


<cgllUcllg, Wo E€ R” (112) 
K 


(Sctc, uc) < (VEKV cuc, uc) < ch(Sctc, uc) 
Vuc € RN (113) 


which are the matrix form of the inequalities 


a(P*uc, P*uc) < a(Puc, Puc) 


< cha(P*ug, P*uc) Yuc € Vip, (114) 


Of course, replacing P* by P, we lose the orthogonality. 
However, the following lemma shows that the bounded 
extension gives us a stable splitting. 


Lemma 2. Jf (108) holds, then 
y=cos < (Vo, V) s y1- c7 <1 (115) 


Proof. Follows immediately from relation (62) in Lemma 1 
(see e.g. Haase et al., 1994). 


The sharp (minimal) constant cp in (112)—(113) pro- 
viding also the sharp constant in (115) is given by the 
maximal eigenvalue Amay of the generalized eigenvalue 
problem VEKẸ -uc = \Scuc. Now, we can summarize 
our results in the following theorem. 


Theorem 8. Assume that there is some bounded extension 
Pio: Ry, > Ry,» Satisfying (112) with some constant cp > 
1 and that there are SPD preconditioners Cç and Cy for 
VEKV, and K, respectively, that is, there are positive 
spectral equivalence constants Y ~» Yo: Yp and Y; such that 


YoCe SVEKVe < YcCc 
and 


yC SK; < Y:Cr (116) 


Then, the inexact ASM preconditioner 


calle Pr) (Cc + oY an 
o L; 0 C -Pec L 
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based on the bounded extension splitting (105) is spectrally 
equivalent to K, that is, 


yC<Ks<yc (118) 


with the spectral equivalence constants 


y =min{y,, yh -y1 — c) 
y = max{fc, w(t + /f1- c) (119) 


Proof. Follows easily from Corollary 1 and Lemma 2, O 


Theorem 8 provides us with a guide for choosing the 
ingredients C}, Cç and Pic of the inexact ASM precondi- 
tioner (116) in such a way that the final spectral equivalence 
constants y and 7 in (113) do not depend on k, H and 
the jumps of coefficient too much. Optimal ingredients will 
lead to an optimal inexact ASM preconditioner. Let us sum- 
marize some concrete proposals for choosing Cz, Co, and 
Pic: 


Preconditioners C, for the local Dirichlet problems 

Since C; = diag(C;),-1,.;, we only need good precondi- 
tioners C;, for the ky a arising from the FE discretization 
of the local Dirichlet problems where the coefficients of 
the PDE are changing only smoothly. Nowadays, a lot of 
optimal (linear complexity) preconditioners for such prob- 
lems are available (see e.g. Chapter 20, this Volume or 
Bramble and Zhang, 2000). For instance, local multigrid 
pteconditioners of the form 


C; =K,0, — E$) (120) 


will do a good job, where E, denotes the correspond- 
ing multigrid iteration operator. They can be generated, 
for example, by one (k = 1) symmetric V-cycle with 
appropriately chosen multigrid components such that C; 
is SPD (Jung and Langer, 1991). Since we can assume 
that the multigrid iteration operators E, are nonnegative 
with respect to the energy inner product and the multi- 
grid rates |E; lx, are bounded by some mesh-independent 
constant ņ < 1, we see that the second spectral inequal- 
ities in (116) are fulfilled by y, =1-—1* and 7; =1. 
The operation count for the local “preconditioning operation 
gives ops(C7' d;,) = O(N; ) = O((H/h)*). Whereas vari- 
ous optimal preconditioners are available in the h-version of 
the FEM, the situation is quite different for the hp-version. 
Examples of optimal, or at least almost optimal local hp pre- 
conditioners can be found in Korneev and Jensen (1999), 


Korneey (2001, 2002a), Beuchler (2002), and Korneev, 
Langer and Xanthis (2003). 


Preconditioners Co for yz aK A 

Owing to the spectral i@latioas (113), every good Schur- 
complement preconditioner Cç, given in Section 5.2, is 
also a good preconditioner for VIKV, provided that the 
extension constant c, is small. More precisely, the spectral 
equivalence inequalities 8.Co < Sc < cCc and (109) 
give us the first spectral inequalities in (116) with y, = èc 
and Fc = c45¢. On the other hand, one can again construct 
preconditioners of the form 


Co = VIKV ode — EX)! » (121) 


applying s iteration steps of some symmetric internal iter- 
ation method (e.g. s symmetric V-cycles) with the linear 
iteration operator Eç directly to the ASM subspace matrix 
VE EKV¢. Mention that the discrete bounded extension oper- 
ations discussed below are cheap such that in turn the cost 
for the matrix-by-vector multiplication VEKV od, is pro- 
portional to N. 


Discrete bounded extension 
Owing to the equivalence 


2 2 
8 lalar) = Volte) 


i 
vel (Q)):v=u|r; 
<S lulinag, Yu EWPT) (122) 


and to (102), we can reduce the construction of a bounded 
extension operator P to the construction of a local bounded 
extension operator ae Vir, C WZT) > Vig, € H! 
(&2;) such that the inegüality 


lP;ulmgo = IY Pula < Te julma Ve € Vip, 


(123) 
is valid for some positive constant €y. If €p is independent 
of H/h, then the extension constant c4 = ¢2,/@ does not 
depend on these parameters either. 

Let us now review some cheap bounded discrete exten- 
sion procedures of that kind. The first computable extension 
procedure was proposed by Matsokin and Nepomnyaschikh 
(1985) on the basis of some averaging technique that pro- 
vides a uniform bound (see also Nepomnyaschikh, 1991a). 
Haase etal. (1994) introduced the hierarchical extension 
that is very cheap. In 2D, it leads to a In(H/h) growth of 
cy, that can be compensated by O(inIn(H/h)) multigrid 
iterations. In 3D, the hierarchical extension is too weak. 
However, the multilevel extension that was proposed by 
Haase and Nepomnyaschikh (1997) works fine in 2D as 
well as in 3D. 


5.4 Neumann—Neumann preconditioners 


Bourgat et al. (1989) introduced the Neumann—Neumann 
Seir complement preconditioner (cf. also Section 5.2) 
c = (1/4)S¢} + (1/4)S¢! for the case of two subdo- 
mains (J = 2) and showed that K(Ce! Sc) = O(1). The 
operation Wo, = = Sz) do, (dc, = do, =dc) is obviously 
equivalent to solution of the system 


Ke, mer) (7) = (3) 

(xg, K, wy O; = 
that corresponds to the Neumann boundary condition on 
Te = 82, NIN for j = 1,2. 

De Roeck and Le Tallec (1991) generalized the Neu- 
mann—Neumann preconditioner to the general case of 
J subdomains. To simplify the notation, we skip the 
subindex C, that is, S, = Sc, A; = Ac, and so on. In 
order to weight the contributions from the different sub- 
domains &,, we introduce the weight matrices D, such 
that Ti ATDA; = I. There are different ways to choose 
these weights. Following Mandel and Brezina (1996), we 
define D, as the diagonal matrix diag(a/ J=, With the 
diagonal entries 


d =a/ >) o% (125) 


i2xp €0Qy 


This choice avoids the dependence of the condition number 
on the coefficient jumps in the PDE. In the case that all 
aj = 1 (Poisson equation), the diagonal entry dj is equal 
to the reciprocal of the number of subdomains meeting at 
the nodal point x, to which the diagonal entry dj belongs. 
Similar to the case of two subdomains, we can now write 
the multisubdomain Neumann—Neumann preconditioner C 
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in the form 


J 
-i TpTg-1 
= VAD; S;'D,A, (126) 


that is again an additive Schwarz preconditioner. The algo- 
rithmic form of the preconditioning operation w = C~!d is 
given in Algorithm 7. In general (82, O Tp = Ø), the local 
Schur complements S; as well as the corresponding sub- 
domain stiffness matrices in (124) are singular because they 
are derived from pure local Neumann problems. Thus, the 
kernels correspond to the functions that are constant in the 
subdomain 82, (Example 1) and to local rigid body motions 
in the case of linear elasticity (Example 2). Therefore, to 
ensure solvability, the right-hand sides of the systems must 
be orthogonal to the kernel (this is in general not the case!), 
and if they are orthogonal to the kernel, then the solution is 
not unique. Another serious drawback of Algorithm 7 con- 
sists in the absence of some global information exchange 
mechanism that causes the H~? dependence of the rela- 
tive condition number «(C~!S). The balancing technique 
introduced by Mandel (1993) removes both drawbacks (see 
also Dryja and Widlund (1995) for a different approach). 
Let us introduce N, x M, matrices Z, consisting of M, lin- 


ear independent column vectors z? € RY (m =1,..., Mj) 
such that 
kerS, C rangeZ, = span[z}, ae a") (127) 


For our model problem (Example 1), we can simply choose 
kers, = rangeZ,; = span[(1,..., 1)?] for the singular case 
and omit the balancing procedures (128) and (130) in Algo- 
rithm 8 for those i’s which belong to the regular local Schur 
complements S;. Then it is clear that some vector r fulfills 
the local orthogonality conditions ensuring the solvability 
of the local Neumann problems if Z}D,Ajd = 0 for the 


Algorithm 7. Neumann—Neumann preconditioning operation w = C'd. 


d eR given vector (defect/residual) 
for all j = 1,..., J do 
d; = D,Aj,d 
end for 
for all j € {1,..., 
Sw; = d; 
end for 


J 
i= DA DTW; 
j=l 


J} in parallel do 


{initialization} 


{distribute d to the subdomains} 


{solve the local Neumann problems (124) in parallel} 


{average the local solution on the interface} 
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j’s corresponding to the singular cases. Adding this bal- 
ancing step in a symmettic way to Algorithm 7, we arrive 
at the so-called balancing Neumann~Neumann precondi- 
tioning Algorithm 8. 

The following remarks may be useful for the practical 
implementation of Algorithm 8: 


1. On the one hand, the balancing steps (128) and (130) 
can be omitted for such subdomain 2; where the 
subdomain problems are regular, that is, M, = 0, no 
A; and p. On the other hand, the spaces rangeZ, can 
always be enriched. 


2. The sparse SPD matrix arising from the auxiliary 


problems (128) and (130) has the dimension M x 
M with M =i M;. This matrix can easily be 
generated in a preprocessing step. 


3. Owing to the postbalancing step (130), any solu- 


tion of the local Neumann problems (129) will be 
appropriate. 


4. Tf the old residual d is already balanced, then the pre- 
balancing step (128) can be omitted, that is, if the initial 
residual of the Schur-complement CG iteration is bal- 
anced, then the prebalancing step (128) can always be 
omitted. 


From Algorithm 8, we observe that the balanced Neu- 
mann—Neumann preconditioner can be rewritten in the 
compact form C7! = (I — E)S~! with the iteration matrix 


E = (-—P)d—-TS)d—P) 


J 
=(1-P) Í - (Zawisna) S | d- P) (131) 


jal 


where Ss} denotes the Moore—Penrose pseudo-inverse of 
S;, and P is fie S-orthogonal projection onto the space 
P = [v = $; ATDTz ERY: zj € rangeZ,} that plays 
the role of some ‘coarse grid space’. Thus, the balancing 


Algorithm 8. Balancing Ncumann—Neumann preconditioning operation w = C7'd. 


d e R” given vector (defect/residual) 
Find 4, ¢R™, J =1,..., J, such that 


J 


{initialization} 
{balancing the old residual vector} 


J 
ZD,A,; (a - 8 AjD}z;>,) =0 Vi=1,...,J (128) 


j=l 


and set r = d — S £j; ATDIZ,, 
for all j =1,..., J do 

r; =D,Ayr 
end for 


for all j € {1,..., J} in parallel do 


{distribute r to the subdomains} 


{solve the balanced local Neumann problems (124) in parallel} 


Wy =r; (129) 
end for 
; M, 
Find p; €R™, L= 1,..., J, such that {balancing the new residual vector} 
, J 
z10,A,(a—S >> AFDfcw, + Zjv) =0 Vi=l,...,J7 (130) 
j=l 


Fi 
w= J AGD; (w; + Zu) 
ji 


{average the local solution on the interface} 


steps are handling not only the singular local Neumann 
problems but also the global information transport via the 
space P. This is the reason why some enrichment of space 
can be meaningful. From (131), we immediately see that 
the balanced Neumann—Neumann preconditioner is some 
kind of hybrid Schwarz preconditioner, as described in 
Section 3.2.3. 


Theorem 9. The balanced Neumann—Neumann precondi- 
tioner C = S — EY! defined by Algorithm 8 is SPD and 


ki J 
JIA, } AJ D701, 


(C8) < Sip i=l = 
wjéX;, jal, 
i Sua I, 
j=l 
H 2 
a ( Fig z) (132) 


where X = {u, eR’: (u,, v) =0Yyv € kerS; and 


(Sju, v) =0 Vv; € rangeZ,}, and c denotes a positive con- 
stant that is independent of h, H, and the jumps in the 


coefficients. 


Proof. The first part of estimate (132) was proved by 
Mandel (1993). This estimate is based on pure linear 
algebra arguments and is not directly connected to our 
model problem. This abstract estimate was used by Mandel 
and Brezina (1996) to produce the bound at the right-hand 
side of estimate (132) for our model problem. a) 


The advantages of the balanced Neumann—Neumann 
Schur-complement preconditioners are certainly their 
(almost) independence of bad parameters (see also Mandel 
and Brezina (1996) for impressive numerical results) and 
the fact that more or less standard software routines can be 
used. On the other hand, the balanced Neumann—Neumann 
preconditioned Schur-complement CG that is mostly used 
in practice is quite expensive with respect to the number 
of arithmetical operations because one Dirichlet and one 
Neumann problem must be solved exactly (directly) per 
subdomain (however, completely in parallel) and per 
iteration step. Inexact versions are not straightforward (see, 
however, Sections 3 and 5.3). 


5.5 Finite element tearing and interconnecting 
methods 


The FETI methods were introduced by Farhat and Roux 
(1991) (see also Farhat and Roux (1994) for a more 
detailed description by the same authors) as a nonover- 
lapping DD parallel solution method for our system (13) 
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of conform finite element equations that can be reduced 
to the Schur-complement system (95) after eliminating 
the internal unknowns, as described in Section 5.1. Tear- 
ing the unknowns uc at the interface Iç first into inde- 
pendent unknowns building the vectors u),...,Uy (from 
now on we again omit the index C, as in the preced- 
ing section) and then again enforcing the continuity by 
simple interconnecting constraints Bu=0, we arrive at 
the saddle-point problem: Given f = (f));<1,..,7 € R”, find 
u= (W), 7 €U =U, x- xU; =R" and REA = 
rangeB C R” such that 


S B')\/u f 

Goel) m 
where S = diag(S;)j=1,„7 denotes the Nx WN block 
diagonal matrix with the subdomain Schur complements 
on the diagonal, B = (B,,...,B,) is the Mx N matrix 
of constraints that measures the jump of a given vector 
u across the interface, N = N; +--+ Nj, and M is the 
number of Lagrange multipliers. Each row of the matrix 
B is connected with a pair of matching nodes across the 
interface. The entries of such a row are 1, —1, and 0 for the 
indices corresponding to the matching nodes and otherwise. 
Therefore, Bu = 0 implies that the finite element function 
u corresponding to u is continuous across the interface 
Tc. We assume here that the number of constraints at 
some matching node is equal to the number of matching 
subdomain minus 1. This method of the minimal number 
of constraints resp. multipliers is called nonredundant (see 
e.g. Klawonn and Widlund (2001) for the use of redundant 
constraints), 

Since kerS N kerB = {0}, the saddle-point system (133) 
has a unique solution and is completely equivalent to the 
Schur-complement problem (95). The subdomain Schur 
complement S; is singular if the corresponding subdomain 
Q, does not touch the Dirichlet boundary Ip, that is T = 
8Q for our model problem. Such subdomains are called 
floating subdomains. Similar to Section 5.4, we assume that 
kerS can be represented by the range of some N x L matrix 
Z, that is, now kerS = rangeZ, with L being the number of 
floating subdomains. If we assume for the time being that 
the solvability condition 


f—B™) L kerS = rangeZ, ie. Z™(£- B7) =0 (134) 


for first equation in (133) is fulfilled, then the solution u 
can be represented in the form 


u = S+ (£ — BTA) + Za (135) 


with some element Za € kerS that has to be determined. 
Substituting now (135) into the second block equation of 
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(133), we arrive at the dual problem 
BS*B") = BS'f + Go (136) 


for defining à and œ with the abbreviation G = 
BZ. Defining now the orthogonal projection P = I— 
G(G'G)—'G" from the space A onto the subspace Ay = 
kerGT = (rangeG)+ with respect to the scalar product 
C) = (aa = Cga, we can split the definition of ^ 
from the definition of a. Indeed, applying P to (136) gives 
the equation 


PBS*B") = PBS'f (137) 


since PGa = 0. Together with the solvability condition 
(134), we get the final dual problem in the following form: 
Find à € A such that 


PF) = Pd subject to GT) =e (138) 


with the abbreviations F = BS+B', d= BS'f, and e = 
Z"f£. Once » is defined, from (136) we obtain 


a = (GTG) GTF — d) (139) 


and, finally, we get u from (135). The solution uç of the 
Schur-complement problem (95) can easily be extracted 
from u. 

The dual problem (138) is now solved by the precondi- 
tioned conjugate gradient (PCG) iteration in the subspace 
Ay that is presented in Algorithm 9 as a projected PCG 
method. 

The matrix-by-vector multiplication Fs” = BS+BTs” 
means the concurrent (direct) solution of J local Neumann 
problems. The orthoprojection P ensures the solvability 
of the Neumann problems and the global information 
exchange. The application of P= I—G(G'G)~'G? to 
some vector w € A involves the direct solution of a small 
system with the L x L system matrix GTG that plays the 
role of some kind of a coarse grid problem. The FETI 
preconditioner C should be spectrally equivalent to the 
FETI operator F on the subspace Ag = kerG", that is, 


YCA, A) < (FA) <= (CAA) VRE Ay (140) 


with positive spectral equivalence constants y and Y such 
that K(PC-'P™PTEP) < y/7 is as small as possible and the 
preconditioning operation C-'d is as cheap as possible, 
Farhat and Roux (1991) proposed the FETI preconditioner 


Cc"! = BSB? (141) 


Algorithm 9. FETI subspace PCG iteration. 


2° = G(G™G) te 

d? = P(d — F1?) 

w? = Ca? 

s si 7 = Pw? 

By = (w?, d°) = (2°, d°) 


for n = 0 step 1 until B, < 7B) do 


x” = PFs" 
a, = (x", S”) 
a = B,,/e, 


yeti = K +as" 
arti = d” — ax" 
wnt = C-ia”+! 
ght! = pw”"+! 
Basi = (W+, aI) = GPE g") 
B = Bnot/Pa 
gttt = gt — Bs" 
end for 


{initialization} 


{forcing the constraints GTX? = e for the initial guess} 
{compute the defect and project to the subspace Ag} 


{precondition step} 
{project the correction to the subspace Ag} 
Pp 0 


{begin iteration loop} 

{matrix-by-vector multiplication + projection} 
{update of the iterate in the subspace Ag} 
{update of the defect in the subspace Ao} 


{precondition step} 
{project the correction to the subspace Ap} 


{update of the search direction in the subspace Ap} 


{end iteration loop} 
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that is now called the Dirichlet preconditioner because the 
multiplication of S with some vector requires the concurrent 
solution of J Dirichlet problems. Mandel and Tezaur (1996) 
proved that the relative spectral condition number is rigor- 
ously bounded by c(1 + log(H/h))>. This polylogarithmic 
bound could be improved to c(1 + log(H/h))* for special 
domain decompositions that do not contain cross points. 
Numerical studies with the Dirichlet FETI preconditioner 
(141) can be found in Farhat and Roux (1991) and Stefanica 
(2001). Similar to the balanced Neumann—Neumann Schur- 
complement preconditioning technique, Klawonn and Wid- 
lund (2001) introduced the preconditioner 


c~ = (BD~'BT) IBD 'SD'BT(BTD B) (142) 


with the diagonal scaling matrix D = diag(D,);_;,._ 7. This 
scaling and the introduction of an appropriately scaled 
scalar product in A, (this affects the orthogonal projection 
P!) lead to the rigorous bound c(1 + log(H/h))? where 
the constant c is now independent of the jumps in the 
coefficients. The numerical and the parallel performances 
of the classical Dirichlet preconditioner and this new pre- 
conditioner are compared in Stefanica (2001). In its exact 
version, the FETI Algorithm 9 requires the exact (direct) 
solution of one local Neumann problem (matrix multiplica- 
tion step) and one local Dirichlet problem (preconditioning 
step) per subdomain (ie. in parallel) at each iteration step. 
In these parts, local direct solvers can be very expensive in 
practical applications where very complex local problems 
can appear. The use of inexact solvers (preconditioners) 
for the local Dirichlet problems is more or less straightfor- 
ward, whereas the replacement of exact Neumann solvers 
by inexact ones is not. 

Klawonn and Widlund (2000) avoided the reduction 
of the original problem (13) to the saddle-point problem 
(133) by eliminating the internal unknowns and related the 
original problem (13) directly to the saddle-point problem 


K B')\/u f 

(s o)(i)-) 
with the torn original stiffness matrix K = diag(K;) -1...7 
and the interconnecting matrix B. We refer the reader 
to Farhat, Lesoinne and Pierson (2000) for some fur- 
ther development of the FETI methodology. In particular, 
the dual-primal FETI (FETI-DP) methods, introduced by 
Farhat et al, (2001), seem to be very attractive because 
they avoid the solution of singular problems on the subdo- 
mains by fixing some primal unknowns. The first results on 
the convergence analysis of FETI-DP methods were given 
by Mandel and Tezaur (2001) and Klawonn and Widlund 


(2002). Langer and Steinbach (2003) introduced the bound- 
ary element tearing and interconnecting (BETI) methods as 
boundary element counterparts of the FETI methods. 


5.6 Mortar methods 


In the classical FETI method, we tore (split) the unknowns 
on the interface, which are conform in our original FE 
scheme, and interconnected them again by simple (1, —1) 
equality constraints with the only aim to construct a DD 
solver that is essentially based on the dual problem for the 
corresponding Lagrange multipliers. The Mortar technique 
proposed by Bernardi and Maday and Patera (1993, 1994) 
goes one step further and allows the triangulation to be non- 
conforming. Thus, the FE solution cannot globally conform 
in this general situation and the continuity must be enforced 
by constraints in an appropriate way. This continuity con- 
straints can be included into the product FE space or can 
be incorporated by Lagrange multipliers in a saddle-point 
formulation. 

Let us again consider our model problem of Example 1 
with piecewise constant coefficients and the nonoverlapping 
domain decomposition (92), but now we allow the triangu- 
lation and the FE functions to be nonconforming across the 
interfaces 982, 982;. Thus, we look for a nonconforming 
FE solution u in the product FE space U = V, x... x V}, 
where the subdomain FE spaces V, = V, (2) c H (Q,) N 
HA2) are defined on the individual subdomain triangula- 
tions 7; using their individual finite elements. In order to 
get a proper approximation to our weak solution, we have 
to enforce weak continuity constraints by Lagrange multi- 
pliers in such a way that the approximation and consistency 
errors are not perturbed. To do this, we first introduce two 
different, but complementary nonoverlapping decomposi- 
tions l'c = Uj Uiemo Tj and Fe = Uj=ı Uiemg Py 
of the interface T'o into mortar and nonmortar faces 1; = 
INR; NIN; CIN, (edges in 2D). The face Ty, is consid- 
ered as a part of 0&2, and inherits the (surface) triangulation 
from Qj. If some face T’;; is mortar, that is, ¿ € M(j), then 
its opposite side Lij C ƏN, is nonmortar, that is, j ¢ M(@). 
Let us now introduce the discrete Lagrange multiplier space 


J J 
A= JJ Aayc]] [| Werp a4 


J=lieM(j) J=lieM() 


where the local discrete Lagrange multiplier spaces are all 
connected with the nonmortar faces. The choice of local dis- 
crete Lagrange multiplier spaces A(I,;) is crucial not only 
for the approximation properties but also for efficiency rea- 
sons. For instance, in the case of linear triangular elements 
in 2D, the classical local discrete Lagrange multiplier space 
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A(T;;) is a subspace of codimension two of the trace space 
Vir, om the nonmortar edge T,j. More precisely, A(T;;) 
consists of all continuous, piecewise linear functions on 
Tj» that are constant in the first and the last interval of the 
1D mesh on I’, induced by the mesh of &,. We refer to 
Ben Belgacem and Maday (1999) for the 3D case and to 
Wohimuth (2000) for biorthogonal mortar elements. 

Now, the mortar scheme can be formulated in the con- 
Strained product space V = {v € U: b(v, p) = 0 Yp € A} 
as nonconforming DD FE scheme: Find u € Y such that 


atu,v)=(f,v) Wwey (145) 


Alternatively, the mortar scheme can be reformulated as a 
mixed scheme in the unconstrained product space and the 
Lagrange multiplier space. Find u € U and à € A such that 


atu, v) +b(v, A) =(f,v) YoeU (146) 
b(lu,u)=0 Ywed (147) 


where a(u, v) = a a; Loy Vu(x) - Vo(x) dx, b(v, u) = 


Lja Diemey Jr, lvl eds, and [v] = vlr, — vlr, denotes 
the jump across face [';, that geometrically coincides with 
Ry: 
The saddle-point problem can be rewritten in matrix 
form as the full FETT saddle-point problem (143), or, after 
eliminating the inner subdomain unknowns, as the reduced 
FETI saddle-point problem (133). However, the Lagrange 
multiplier matrix B is now defined by the mortar conditions 
(147) across the faces Ty instead of the simple hard nodal 
continuity condition in the FETI method. Now, the FETI 
solver can be used for solving the mortar saddle-point 
problem in the same way as in Section 5.6 for solving 
the original FETI equations (133) (see Stefanica (2001) for 
more information and numerical experiments). We mention 
that other nonoverlapping DD algorithms and multilevel 
methods can successfully be applied to the solution of the 


mortar equation as well (see Wohlmuth (2001) for more 
information). 
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1 INTRODUCTION 


Nonlinearities pervade all areas of mechanics, and nonlin- 
ear systems of equations arise in connection with numerous 
mechanical problems, The forms and properties of these 
systems depend strongly on the type of problem and the 
inherent sources of nonlinearities. It has long been rec- 
ognized that, except in special cases, direct methods for 
solving nonlinear systems are unavailable or infeasible and 
attention must focus on iterative processes. The choice and 
effectiveness of an iterative technique depends critically on 
the available information about the particular equations and 
their solution set, the aims and accuracy requirements of 
the computation, and the available computational resources. 
There are generally no satisfactory gnidelines for decid- 
ing on a ‘best’ solution process for a class of nonlinear 
problems, This is not likely to change because of the ever 
widening range and complexity of problem types under 
consideration and the continuing advances in scientific com- 
puting in response to the fast pace of hardware and software 
development. 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


This chapter gives an overview of some of the basic 
theoretical and numerical results concerning the solution of 
systems of nonlinear equations and of several topics related 
to parameterized systems and their bifurcation behavior. 
Obviously, the presentation can only touch a smail part 
of this extensive area. We begin with a brief notational 
summary. 


1.1 Notations 


In order to avoid technical details, this presentation works 
with n-dimensional real linear spaces R” of column vectors 
x with components x4, X2, ---, Xy. This is not a restriction, 
since for the most part the material is basis-independent and 
hence R” may also be regarded as an abstract real linear 
space. Correspondingly, A € L(R", R") denotes either an 
m xn matrix or a linear operator as context dictates. As 
usual, GL(n) is the general linear group of invertible 
A e L(R”) (= LR’,R’)). 

With this notation, the nonlinear mappings central to our 
discussion have the form 


Fy lq, Xp, ++ Xp) 


Jais Xz -ees Xa) 
PERR Foa | Rn 


I 


Fin Oinas Xq) 
my 
Ka 
Vere] 0 [ER (D) 
x 


Here, n and m are given dimensions and the domain E is 
always assumed to be an open subset of R”. Multivariable 
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calculus is an essential tool for the study of nonlinear sys- 
tems. A mapping (1) is of class C”, r > 1, (ora C” map, for 
short) on its (open) domain £ if for each component func- 
tional f, i =1,...,m, all partial derivatives up to order r 
exist and are continuous. The mapping is F-differentiable 
(Frechet-differentiable) at a point x € E if there exists a 
linear operator A € L(R", R™) such that 


1 
lim ——||F(x +h) — F(x) — Ahl =0 
i) (All| 


The limit is independent of the norms and the unique 
operator A is called the F-derivative of F at x and denoted 
by DF (x). With the natural bases used in (1), the derivative 
has the matrix representation 


6 ð 

EPRIN vee 5x, 1) 
DF(z) = Pas Ki 
FRA ee dx, in) 


If the F-derivative of F exists at a point x, then F is 
continuous at that point. A C! mapping on E is F- 
differentiable at each point of E. 


2 GENERAL ITERATIVE PROCESSES 
2.1 Existence considerations 


Meaningful results about methods for solving a ‘square’ 
system 


F(x)=0, F:E CR” — R", E open (2) 


depend critically on the properties of the function F. 
Two special cases of (2) are better understood than most 
others, namely, the n-dimensional linear systems (see 
e.g. Golub and van Loan, 1989), and the one-dimensional 
(i.e. scalar) nonlinear equations (see e.g. Heitzinger, Troch 
and Valentin, 1985). Here, the emphasis will be on methods 
for the general nonlinear case and not on results that are 
specific only to these special cases. 

The design and study of an effective numerical method 
for a given system (2) requires some understanding of its 
solvability properties. Simple scalar examples already show 
that there may be any finite or infinite number of solutions 
or none whatsoever. In fact, it has been proved that for each 
closed set SC R!, there exists a C% map from R! into 
itself, which has S as its zero set. This is a special case of 
a more general n-dimensional result proved by H. Whitney 
in 1934. If solutions are known to exist for a system (2), 


they may exhibit rapid changes under perturbations of F, 
which usually impede their computation. 

The study of the existence of solutions and their proper- 
ties is a topic of nonlinear functional analysis and is outside 
the frame of this chapter. We mention only briefly a few of 
the principal approaches that apply to the finite-dimensional 
case: 

A conceptually simple, but powerful technique is the 
transformation of (2) into an extremal problem for some 
nonlinear functional 


g:E CR” —>R!, E open 6) 


A point x* € E is a local minimizer of (3) if g(x) > g(x*) 
for all x in some open neighborhood of x* in E and a global 
minimizer if the inequality holds for all x € E. A critical 
point of g is any x* € E where g has an F-derivative 
for which Dg(x*) = 0. If g is F-differentiable at a local 
minimizer x* in the open set Æ, then x* is necessarily 
a critical point. (This result also holds under a weaker 
differentiability condition.) A mapping (1) is a gradient (or 
potential) mapping on E if there exists a functional (3) that 
is F-differentiable on E and satisfies F(x) = Dg(x)" for 
all x € E. A C! mapping (1) on an open convex set E is a 
gradient mapping if and only if DF (x) is symmetric for all 
x € E. For any gradient mapping, the problem of solving 
the system (2) can be replaced by that of determining the 
local minimizers of the functional g, provided, of course, 
we keep in mind that in this way not all solutions of the 
nonlinear system may be obtained. This corresponds to the 
variational approach in the theory of differential equations 
of importance in many areas of mechanics. 

Even if F is not a gradient mapping, the system (2) can 
be converted into a minimization problem. In fact, this also 
applies to an overdetermined system 


F(x)=0, F:ECR’>R", n<xm (4) 


Let f:R”™ — R! be a functional that has x = 0 as a unique 
global minimizer. For instance, we might use f(x) = 
xTAx with a symmetric, positive-definite A € GL(m) or 
f(x) = |x|] for some norm on R”. Then each solution 
x* e E of (4) is a global minimizer of the functional 
g(x) := f(F(x)), x € E, and hence, x* may be found by 
minimizing g. But note that a global minimizer x* € E 
of g need not satisfy (4) since this system does not even 
have to have a solution, and very likely for n < m, will 
not have one. Any global minimizer of g on E is called 
an f-minimal solution of (4). Various cases of f, such 
as f(x) = lixe or f(x) =x'x, are of special interest. 
In the latter case, the functional to be minimized has the 
form g(x) := F(x)’ F(x), x € E. This defines a nonlinear 
least-squares problem and, correspondingly, the f-minimal 


solutions of F(x) = 0 are called least-squares solutions. In 
applications, least-squares problems often arise naturally, 
for example, in the course of estimating parameters in a 
functional relationship on the basis of experimental data. 
Another class of existence results for (2) is based on 
arguments derived from the contraction principle and its 
many generalizations. For this, the system is written in 
the fixed-point form F(x) := x — G(x) involving some 
mapping G. As the name indicates, the zeros of F are 
exactly the fixed points of G and the contraction mapping 
theorem concerns the existence of such fixed points: 


Theorem 1. Let G: E C R” — R” satisfy the contraction 
condition 


IGE) = GOJI < alx- yl, xyek, a<1 6) 


Then G has a fixed point in every closed subset C C E that 
is mapped into itself by G, that is, for which GC CC. 


Examples show that the contraction property (5) by itself 
does not suffice to guarantee the existence of a fixed point; 
in other words, an additional assumption, such as GC C C, 
is indeed needed. Theorem 1 holds on complete metric 
spaces and there are numerous generalizations and exten- 
sions; see, for example, Ortega and Rheinboldt (2000). 

The contraction condition plays an important role in 
many parts of multivariable analysis. It underlies, for 
instance, the familiar inverse and implicit function theorems 
that provide local existence results for nonlinear equations. 
Much deeper are the topological approaches used in nonlin- 
ear functional analysis ranging from classical degree theory 
to modern differential topology and global analysis; see, 
for example, Berger (1977). This includes, for instance, the 
Brouwer fixed-point theorem, which guarantees that a con- 
tinuous mapping G: E c R” — R” on a compact, convex 
set C C E has a fixed point in C if it maps C into itself. 


2.2 Process characterization 


Suppose that for a (square) nonlinear system (2) the exis- 
tence of solutions has been established. The problem of 
computing such solutions includes a range of tasks, such 
as (i) the determination of sets that are known to con- 
tain solutions, (ii) the construction of iterative processes 
for approximating a specific solution, and (iii) the develop- 
ment of methods for determining all solutions. The tasks 
(i) and (iii) represent as yet wide open research areas. 
Most of the current methods for (i) are based on the use 
of interval computations that have become an active and 
promising research topic in recent years but have not found 
very widespread use; see, for example, Kearfott (1996). 
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General methods for (iii) are available only in the case 
of minimization problems; see, for example, Floudas and 
Pardalos (1996). Accordingly, the discussion here will focus 
on methods for the task (ii). 

In general, an iterative process J for approximating a 
solution of (2) consists of a ‘step algorithm’ G for a single 
iteration step and a ‘control algorithm’ C for controlling 
the course of the iteration. A state of 7 is a triple {k, x, M} 
consisting of a step count k, the current iterate x € R”, 
and a memory set M. The specific content of the memory 
set varies with the process type and the implementation 
details. In particular, M has to provide all the needed 
information about the procedure for evaluating F and, if 
needed, the derivatives of F. But, as the name indicates, M 
may also retain computed data for later use and redundant 
data that are too costly to recompute. The input of both 
G and C is assumed to be a given state. The evaluation 
of G{k, x, M} may result in an error, otherwise the output 
is a new state {k, x, M} consisting of the incremented step 
count k := k + 1, the next iterate x, and an updated memory 
set M. The output of C{k, x, M} consists of one of the 
three decisions: ‘accept,’ ‘fail,’ and ‘continue,’ signifying a 
successful completion of the iteration, failure or suspected 
divergence of the process, and the need for another step 
respectively. The process is forced to terminate when the 
iteration count k reaches a specified maximal value kmax- 
Thus, altogether, Jis an algorithm of the following generic 
form: 


J: input: starting point x, memory set M, 
maximal count kmax; 
k := 0; 
while k < kmax 
decision := C{k, x, M}; 
if decision = ‘accept’ then return {k, x, M}; 
if decision = ‘fail’ then return {‘process failure’ }; 
evaluate {k, x, M} := G{k, x, M}, 
if ‘error’ then return {‘step failed’}; 
endwhile; 
return {‘maximal iteration count reached’}; (6) 


The process (6) is stationary if the output of G does 
not depend on the current value k of the iteration index, 
otherwise it is nonstationary. If for a fixed £ > 0 and all 
k > £, the step algorithm G depends on precisely £ prior 
iterates for the computation of its output, then J is said to 
be an £-step method. Obviously, the required prior iterates 
have to be saved in the memory set. In the simple case 
£ = 1, we have a one-step process. For a stationary one-step 
method, the algorithm G defines a mapping G:R” — R” 
and a step of the method can be written in the familiar 
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form 
x1 Gor, k=0,1,... (7) 


Since the purpose of the process (6) is the determination of 
a solution of (2), it is hoped that 7 produces iterates x* that 
approximate more and more closely a point x* € R”, which 
then can be verified to satisfy F(x*) = 0. This requires 
suitable choices of the starting data and kmay such that 
the control algorithm C produces no failure decisions and 
ultimately accepts an iterate x* that can be guaranteed to 
satisfy {|x* — x*|| < for a given tolerance £ > 0. 

In this generality, little can be proved only because x* is 
unknown and the information available during the process 
is limited and strongly dependent on the problem class, 
the computer, and the implementation of J. In general, 
a convergence analysis of an iterative process becomes 
possible only if the control algorithm C is disregarded, 
that is, if 7 is allowed to continue indefinitely. There 
is an extensive literature on such convergence results for 
various types of methods. We restrict ourselves to the local 
convergence of simple stationary one-step processes written 
in the form (7). 


2.3 Convergence factors and orders 


A comparison of different iterative processes requires some 
measure of their efficiency and computational cost. One 
possible approach is to calculate the overall cost of a pro- 
cess as the product of the average cost of executing one step 
and the estimated total number of steps required for reach- 
ing an acceptable iterate. Obviously, both these quantities 
depend not only on the process and its implementation but 
also on the specific problem. 

The cost of a step includes the cost of executing both the 
step algorithm G and the control algorithm C. Around 1960, 
A. M. Ostrowski proposed the name ‘horner’ for one unit 
of this computational work. There is no generally accepted 
specification of such a unit, but many authors define a 
homer as one scalar function call in the step algorithm G. 
Then, one evaluation of the n components of the mapping 
F equals n horners and a computation of the first derivative 
involves maximally n? horners. 

The number of steps required for reaching an acceptable 
iterate depends strongly on the chosen starting data and 
can vary considerably among the sequences {x*} generated 
by J. Accordingly, interest has to center on worst-case 
measures of asymptotic type. The search for definitions of 
convergence rates of sequences is probably as old as the 
convergence theory itself. We summarize here an approach 
by Ortega and Rheinboldt (2000), which is modeled on the 
standard quotient-test and root-test for infinite series. 


For a sequence {x*} C R” with limit x* and any p € 
[1, 00), define the R-factors (root-convergence factors) by 
limsuppso lxt- x1", if p=1 


8 
lim SUP p59 lx — x74, (8) 


R {xt} = 
a if p> 


and the Q-factors (quotient-convergence factors) by 


; jt- 
1 L fat * 
ee eee | A 
ky Z Ko 
21 = ] o, ifta 
Yk > ky 
+00, otherwise 


Let C(J,x*) denote the set of all possible sequences 
generated by J that converge to x*. Since J has to be 
assessed by its worst-case behavior, the R-factors and Q- 
factors of J with respect to the limit point x* are defined by 


R, (I, x") = sup {R, {x*)}: {x*} € C(I, x}, 
Q,(F,x*) = sup {O,{x*}: {x*] € C(9, x°}, 


Vp E€ [1, œ) 


respectively. The R-factors and Q-factors have values in 
[0,1] and [0, co], respectively. The equivalence of all 
norms on R” implies the norm-independence of the R- 
factors, but simple examples show that the Q-factors 
depend on the choice of norm. For p = 1, the inequality 
R, (7, x*) < O,(9,x*) always holds, but for p > 1 there 
is no general relation between the R-factors and Q-factors. 

The crucial fact about these factors is that they are step 
functions of p with at most one step from the minimal to 
the maximal value. In other words, unless R,{x*} = 0 or 
R,{x*} = 1 for all p € [1, 00), there exists a pọ € [1, 00) 
such that R,{x*} = 0 for p € [1, po) and R,{x*} = 1 for 
p € Îl, Po). This value py is called the R-order of the 
sequence, and accordingly, the R-order of J at x* is 
defined by 


00 if (J, x") =0 
*) Yp €[1, œ) 
OlT = inf{p € [1, œ): 


R,(J, x*) = 00}, otherwise 


Analogously, unless Q,{x*} =0 or Q,{x*} = 00 for all 
p € [1, 00), there exists a po € [1, 00) such that Q, {x*} = 
0 for p € [1, Po) and Q,{x*} = 00 for p € [1, py). Hence, 


the Q-order of 7 at x* is 


00 if 0,(5.x") =0 
r EEN Vp € [1, œ) 

OoJ x) = ) intty € [1, 00): 
Q, (5, x*) = 00}, otherwise 


The R-order and Q-order are both norm-independent and 
satisfy the relation 


Og(T.x*) < ORCI, x") 


With these definitions, it becomes possible to compare 
the convergence of two iterative processes J, and J,. In 
terms of the R-measure, this begins with a comparison of 
the two R-orders p; = Op(J,,x*), i =1,2: the process 
with the larger R-order is R-faster than the other one. 
In the case of equal R-orders p = p; = pz, the process 
with the smaller R-factor R, (Fs x") js R-faster. For the 
Q-measure, the comparison is analogous, except that the 
comparison of the Q-factors also depends on the choice 
of norms. The following special terms are often used to 
characterize the convergence of a process J at a limit 
point x*: 


0<R,(7,x*) <1  R-linear convergence 
0 < Q,(J,x*) <00 Q-linear convergence 


R (J, x*) = 0 R-superlinear convergence 
OMS, x*) =0 Q-superlinear convergence 
ORT, x*) =2 R-quadratic convergence 
OMA, J= Q-quadratic convergence 


2.4 Local convergence 


Consider an iterative process (7) defined by a mapping 
G:E CR" -» R", If the sequence of iterates {x*} C E 
converges to a point x* € E where G is continuous, then it 
foliows that x* = G(x*) and hence that x* is a fixed point 
of G. Here, the standard proof of the contraction mapping 
Theorem 1 provides a first convergence result: 


Theorem 2. Under the conditions of Theorem 1, the itera- 
tive sequence (7) started from a given point x? in the closed 
set C converges to the (unique) fixed point x* € C and the 
process J satisfies R (J; x*) < Q (I, x*) <a 


Let G be a contraction on its domain Æ, If x? € E is 
such that the closed ball 


B(G?) r) = {x € R": fx- GEOI <r}, 


r = NG (2%) - x9} a0) 
1l-a 
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is contained in Æ. Then G maps this ball into itself and 
hence Theorem 2 applies. Obviously, the radius is large 
for a near 1 and for a large first step G(x) — xl. In 
either case, the ball may not be fully contained in Æ and an 
iterate may fall outside of Æ, causing the process to become 
undefined and to stop. 

The iteration (7) can be viewed as a discrete dynamical 
process on the domain E. In that terminology, the orbit 
of a point x € E is the sequence {x*} defined by x° = x, 
x*+1 = G(x*), k =0,1,.... This orbit may terminate at a 
point x** ¢ E with a finite index k,. Otherwise, the entire 
sequence {x*}0°, is defined and remains in Æ. This is called 
an infinite orbit. Of course, the iterates x* of an infinite orbit 
need not be distinct points of R”; there may well be indices 
k, < k, such that x4 = x* in which case the sequence of 
points x*, k = k,,...,k, — 1 repeats itself periodically. 

For our purposes, interest centers on infinite orbits for 
which the iterates converge to some point x* € E, In the 
case of continuous G, this suggests the following concepts: 


Definition 1. For a continuous mapping G: E CR" > 
R” the ‘attraction basin’ A(x*) of a fixed point x* € E of 
G is the set of all points x € E for which the orbit {x*} is 
infinite and satisfies lim,_,,. x* = x". A fixed point x* € E 
of G is a point of attraction of the iteration if it is in the 
interior of its attraction basin, that is, if x* has an open 
‘attraction neighborhood’ U C A(x*). 


Obviously, for certain fixed points, the attraction basin 
may consist only of the point itself, We always have 
G(A(@x*)) C A(x"); but for a point of attraction x*, the 
attraction neighborhood U need not be mapped into itself 
by G. 

For an affine mapping G(x) = Ax +b with A € L(R”) 
and b € R”, the attraction basin is fully characterized. In 
fact, the iteration (7) converges for any starting point 
x? € R” to the unique fixed point x* = Ax* +b in R” if 
and only if A has spectral radius p(A) < 1; that is, if all 
eigenvalues of A are less than 1 in modulus. In other words, 
for p(A) < 1 there exists exactly one fixed point x* € R” 
that has the entire space R” as its attraction basin. In the 
nonaffine case, there is only a much weaker result: 


Theorem 3. Let G: E CR" — R” have a fixed point x* 
in the open set E. If G is F-differentiable at x*, and 
the spectral radius of DG(x*) satisfies e(DG(x*)) < 1, 
then x* is a point of attraction of (7) and RJ, x)= 
p(DG(x*)). Moreover, if (DG(x*)) > O then Orl I, x") = 
OoJ x)= 1. 


For a proof and for some historical remarks, refer 
to Ortega and Rheinboldt (2000). Note that in contrast to 
the affine case, the existence of x* has to be assumed, 
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only local instead of global convergence is guaranteed, and 
merely the sufficiency but not the necessity of the con- 
dition p(DG(x*)) < 1 is asserted. Counterexamples show 
that the assumptions of Theorem 3 cannot be improved 
readily. 

If (7) is viewed as a discrete dynamical process, then 
Theorem 3 represents a result on the asymptotic behavior 
of the solutions of perturbed linear difference equations at a 
stationary point. In fact, (7) may be written in the perturbed 
linear form 


xf) _y* = DGO —x*) + O64), k=0,1,... 


where the perturbation D(x) = G(x) — G(x*) + DG(x*) 
(x —x*) is small in the sense that myy- PON 
lix — x*|| = 0. In that form, the result (without the conver- 
gence rates) was first given in 1929 by O. Perron. Clearly, 
this provides a close relation to the large body of results 
on the asymptotic behavior of the solutions of differential 
equations at stationary points. 

If p(DG(x*)) = 0 in Theorem 3, then the convergence is 
R-superlinear. However, it need not be Q-superlinear. The 
case DG(x*) = 0 permits a stronger conclusion: 


Theorem 4. Suppose that the C! mapping G: E C R" > 
R” has a fixed point x* € E where DG(x*) = 0 and the 
second F-derivative D?G(x*) exists. Then x* is a point 
of attraction of (7) and OJ, x*) = Og(F, x*) = 2. More- 
over, if D?G(x*)(h, h) £0 for all h £0 in R", then both 
orders equal 2, 


2.5 Attraction basins 


In general, the boundaries between attraction basins for 
different fixed points have a complicated structure and, in 
fact, exhibit a fractal nature. As an illustration, consider the 
simple nonlinear system 


3 7 
xi — 3x xq —1 


F(x) =| )=0 vxeR" = (11) 


=x} + 3 


representing the real form of the complex cubic equation 
z =1, z €C. In terms of 2,, x, the complex Newton 
method z*+! = [2(z*)3 + 1]/[3(z*)*] becomes the iterative 
process 


2 1 (xt)? — xh? 
k+l “yk + ( 1 2 ) (12) 
7 Bf 3[@he+ œf —2xkxk ) 


Theorem 3 ensures that each one of the three zeros (1, 0)", 
(~0.5, £0.5,/3)" of (11) is a point of attraction of (12). 


0.5 


1 -0.5 0 0.5 1 
Figure 1. Fractal boundaries between attraction basins. 


Figure 1 shows parts of the attraction basins (distinguished 
by different shadings) of the three zeros (each marked by a 
small circle). The fractal nature of the boundaries between 
these basins is clearly visible. 

The literature in this area has been growing rapidly. 
An introduction to fractals is given by Barnsley (1988), 
and Peitgen and Richter (1986) present interesting graphical 
examples of attraction basins for various iterative processes. 
The fractal nature of the boundaries of attraction basins 
is a property of discrete as well as continuous dynamical 
systems and has been studied especially in the latter case. 
For instance, for certain planar differential systems, Nusse 
and Yorke (1994) showed that there exist basins where 
every point on the common boundary between that basin 
and another basin is also on the boundary of a third basin. 

These remarks certainly suggest that in an iterative com- 
putation of a particular solution x* of a nonlinear system, 
it is not unreasonable to expect some very strange con- 
vergence behavior unless the process is started sufficiently 
near x*. This provides some justification for our emphasis 
on local convergence results. It also calls for techniques 
that force the iterates not to wander too far away, and for 
computable estimates of the radii of balls contained in the 
attraction basin of a given zero. Some results along this line 
are addressed in Section 3. 


2.6 Acceleration 


Basically, Theorems 2 and 3 ensure only the linear conver- 
gence of the process (7). Not surprisingly, this has given rise 
to a large literature on techniques for accelerating the con- 
vergence. As Brezinski (1997) shows, much of this is work 
based on the theory of sequence transformations. With- 
out entering into details, we consider, as an example, the 


modification 
x1 xt 1 —vGo), k=0,1,... (13) 


of the process (7). Then the following convergence result 
holds: 


Theorem 5. Assume that G:R" —> R" satisfies 


(GQ) = GQ)" — y) <0, n 

Y. 
IG@ - GOI, < vix =y | 7 SR 70 
and has a (necessarily unique) fixed point. If y <1, 
then (13) converges for v =%¥°/(1 +y?) with R ix*} < 
vv. If y = 1, then (13) converges for any v € ((¥? —1)/ 
0? + 1), 1). 


In practice, the factor v is often difficult to estimate and 
is chosen adaptively, that is, (13) is replaced by 


xttl = vek 4 (1 —vh)Ga4, k=0,1,... 


Various algorithms for the construction of suitable vë have 
been proposed. For example, a generalization of the classi- 
cal Aitken A? method leads to Lemaréchal’s method 


vk 


_ _ [GCOS — G IGG) — 26") +24] 
~ [G(G@*) — 266 +x TIG(G GE) — 26G*) + x4] 


In place of (13), two-step accelerations have been consid- 
ered as well. There are also approaches that lead to trans- 
formed sequences with quadratic convergence, but usually 
they require information that is not easily available in prac- 
tice; see again Brezinski (1997). 


2.7 Condition numbers 


An important aspect of any iterative process J is its 
sensitivity to perturbations of the mapping F, the problem 
data, and the implementation. Such perturbations cannot 
be avoided in computations and examples show that their 
effects may range from a slowdown of the convergence, 
to erroneous results, or even to a complete breakdown. As 
usual, it is desirable to characterize these effects by a single 
number, the ‘condition’ of the nonlinear system. We follow 
an approach by Rheinboldt (1976). 

For a mapping F:E c R” > R” and any closed set 
CCE set 


(F, C) = sup [y € [0, 00}: || F(x) ~ FO) = yllx — yll, 
x,y ec} 
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(F, C) := inf {y € [0, oof: (Fœ) — FO) < ylix — yll, 
x,y EC} (14) 


Then the condition number of F with respect to C is 
defined by 


XF, C) if 
K(F, C) = uF, C) if 0 < pF, C), (F, C) < co 


(ee) otherwise 

(15) 
For an affine mapping F(x) = Ax +b with A € GL(n) 
and C = R”, this becomes k(F, R”) = || Alli AT} ||, that is, 
the standard condition number «(A) of the matrix A. 
Suppose that k(F, C) < oo and that F has a zero x* € C. 
Then the definitions (14) imply that F is a homeomorphism 
from C onto F(C) and hence that x* € C is unique. Let 
F: Ë CR" —> R” be any perturbation of F with C c Ê 
and a zero X* € C. Then, for any given ‘reference point’ 
x? € C, x? Æ x*, the estimate 


lx*— #*l| IF — File 
— F, C)————, 
jr OTE 
IF — Flie = sup ecl EO -FOl a6 


holds, which is analogous to a well-known result for linear 
equat'ons (except that here the existence of the solutions 
had to be assumed.) The estimate (16) affirms that for large 
«(F, C), the error between the solutions of the original and 
a perturbed system of equations may become large even if 
F differs only slightly from F on the set C. 

For k(F, C) < œ, it can also be shown that the error of 
an approximation y* € C of the (unique) zero x* e C of F 
satisfies the a posteriori estimate 


LEGII 
IFEI? 


=i 
jr — xt S 


K(F, C) x EC, x? £x* (17) 


Once again, this corresponds to a familiar linear result. The 
inequality (17) indicates that if the condition number of 
F is large, then a small residual norm ||F(y*)|| does not 
require the relative error between x* and y* to be small as 
well. 

The bounds (14) can be approximated by the norms of 
derivatives of F. Let F:E C R” > R” be continuously 
differentiable on E and x € E a point where DF(x) € 
GL(n). Then, for sufficiently small £ > 0, there exists a 
è > O such that 


IF, C)-IDF@)IIl <8 IMF, C) -IDEO s e, 
C:=B&, 8) CE 
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and hence 


a e < KE, C) ~ K(DF(x)) 
IDFœ@ I eT  L+K(DF(x)) 
£ 


< ——— 18 
ioro ? 
This shows that asymptotically near x the conditions of F 
and DF (x) are the same and justifies the definition of the 
matrix condition k(D F (x)) = || D F(x)|| || DF (x)7}|| as the 
pointwise condition number of F at any point x € E where 
DF(x) € GL(n). 

This result is of special interest at solutions of F(x) = 0. 
A zero x* € E of F is called simple if the F-derivative 
of F at x* exists and satisfies DF (x*) € GL(n). In the 
scalar case, this reduces to the standard definition of simple 
roots of differentiable functions. A simple zero x* € E of 
F is locally unique. If F is continuously differentiable in a 
neighborhood of a simple zero x*, then we associate with 
x* the pointwise condition k(D F(x*)). 

At nonsimple zeros of F, the condition results do not 
apply. Already in the scalar case, it is well known that ‘mul- 
tiple’ zeros are difficult to handle and that their character 
may change quickly with small perturbations of F. Clearly, 
the condition numbers considered here provide only a par- 
tial answer to the wide range of questions relating to the 
influence of perturbations on a problem and on the corre- 
sponding solution processes. The topic still calls for further 
research. 


2.8 Roundoff 


As in all numerical computations, the use of finite-precision 
arithmetic has a potentially profound effect on the solution 
of nonlinear systems of equations. Roundoff errors are 
expected in the evaluation of the function (and, where 
needed, its derivatives), as well as in the execution of the 
iterative process itself. As all perturbations, these errors 
may cause a slowdown of the convergence or even a 
breakdown of the entire process. In addition, it is essential 
to observe that the roundoff errors in the function evaluation 
introduce an uncertainty region for the function values. 
For any x in the domain E of the function F and a 
specific floating point evaluation fl(F(x)) of F, the uncer- 
tainty radius at x is defined as s(x) = ||fA(F (x)) — F(x)||. 
The supremum of the uncertainty radii at the points of a 
subset Ey C Æ is the uncertainty radius of F on that set. 
Ideally, we require that the uncertainty radii are a modest 
multiple of the unit roundoff of the floating point calcu- 
lation. This is typically expected of any library function, 
such as the square-root or the trigonometric functions. But 


there are many examples of simple functions F, such as 
polynomials, where on certain sets a straightforward eval- 
uation of F leads to potentially large uncertainty radii; see, 
for example, Rheinboldt (1998; Sec. 3.4). Clearly, in such 
cases there is hardly any hope for an effective iterative 
determination of the zeros of F. Even if they are not unduly 
large, the uncertainty radii need to be taken into account in 
the process. In particular, they play a role in the design 
of the acceptance and rejection criteria of the control algo- 
rithm C: 

Let {x*} be a sequence of iterates generated by J (in real 
aritcmetic) that converges to x* and {xk}, the correspond- 
ing ouiput of a machine implementation of J using some 
d-digit arithmetic. Suppose that the control process termi- 
nates the sequence {x} with the iterate x%'. A satisfactory 
acceptance test should be expected to guarantee that 


Ugao x = x* (19) 


As a typical example, consider a frequently used test where 
the process is terminated at the first index k* = k*(d) such 
that 3 


s ” 5 
laxk HE = xk l < ealik N 


holds for a given tolerance e4 > 0. Assume that the process 
and its implementation are known to satisfy 


k+l 


xt? — xl] < alx — xt, xk — x* Il < pas 


k=1,2,... 


with a fixed œ <1 and certain roundoff bounds p, > 0. 
Then the convergence of the real sequence implies that 


1 


—a 


eg — 3" S plex" 0+ GB tee apa] (20) 
Thus, we need lim,_,,. €g = 0 and lim- oo og = 0 to prove 
(19). The choice of the tolerances e; is under user control, 
but the roundoff bounds p; depend strongly on the prob- 
lem, the iterative process, and the implementation. Here, the 
uncertainty radii for the function evaluations are coming 
into play and may cause the roundoff bounds p; to con- 
verge to zero much too slowly for any practical purposes. 
The estimate (20) clearly shows the need for matching the 
choice of e; to that of p; and to the convergence rate of 
the process (here represented by «). In practice, a mismatch 
often exhibits itself in an irregular behavior of the computed 
iterates and their failure to progress satisfactorily toward a 
solution. 

Generally, the design of the acceptance test for a control 
algorithm C depends strongly on the problem class, the 
theoretical properties and implementation details of J, the 
information available at the time when C is executed, and 


the characteristics of the particular computer. It is therefore 
hardly surprising that most of the known results for such 
tests, as that of the example, are proved under rather 
stringent assumptions. Many different types of tests have 
been proposed, but there are also many examples that show 
how ‘reasonable’ tests may fail the criterion (19) for simple 
problems. This points to the necessity for tailoring a control 
algorithm, as much as possible, to the specific situation at 
hand and to consider proving its satisfactory behavior only 
in that setting. Few results along this line appear to be 
available in the literature. 


3 SOME CLASSES OF ITERATIVE 
METHODS 


3.1 Linearization methods 


A major class of iterative methods for solving a (square) 
nonlinear system (2) is based on the use of linearizations 
of the mapping F. The idea is to construct at the current 
iterate x* € E an affine approximation 


LR” >R’, Ly (x) = B(x — x4) — Fo), 
B, € GL(n) (21) 


which agrees with F at x*, and to use a solution of 
L(x) = 0 as the next iterate. Since B, is assumed to be 
invertible, the resulting linearization method has the form 


xH = xk — Bo RG), k=0.1,... (22) 


Jn terms of the iterative algorithm (6), this becomes the 
following step algorithm: 


G: input: {k, eh My) 
evaluate F(x“); 
construct the matrix B}; 
solve B,y = F(x*) for y; 
if solver failed then return {‘error’}; 
kt] = xt — y; 
update the memory set; 
return {k +1, x+}, Mpk (23) 


The simplest linearization methods are the (parallel) chord 
methods where all matrices B, are identical: 


aK = x* Bo FO), k=0,1,..., BeGL(n) 
(24) 
Typical examples include the Picard iteration with B = al, 
a x0, and the chord Newton method with B = DF(x°). 
A special case of the Picard iteration arises if F has the 
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form F(x) = Ax — G(x) with A € GL(n) and a nonlinear 
mapping G:E CR" > R". Then for B = A, the chord 
method (24) becomes x*+! = A7! Gx*, k = 0, 1,..., which 
for A = I, that is, for the fixed-point equation x = Gx, is 
the classical iterative method (7). If F is F-differentiable at 
x*, then the iteration function G(x) := x — B7! F (x) of the 
chord method (24) has at x* the F-derivative DG(x*) := 
I — BDF (x*). Hence, if o := pI — BDF (x*)) < 1, 
then Theorem (3) ensures that x* is a point of attraction of 
(24) and R; (J, x*) =0. 

For differentiable F, the most famous linearization 
method is Newton’s method 


xH = x — DF (xt) IFO), k=0,1,... (25) 


where the affine approximation £, of (21) is obtained by 
truncating the Taylor expansion of F at x* after the linear 
term. 

Another large class of linearization methods with varying 
B, are the quasi-Newton methods, also called Broyden 
methods or update methods. Here, the matrices are obtained 
iteratively in the form B,,, = B, + AB, where the ‘update 
matrix’ AB, has either rank 1 or 2. The theory of these 
methods began with the formula 


D a Fu a! m xk) T 
k+ k T ght = xk) T (xkt1 — xk) 

introduced in 1965 by C.G. Broyden. By now the related 
literature has become very large; for an introduction and ref- 
erences we refer to Kelley (1995) and Dennis and Walker 
(1981). The methods have been applied to various prac- 
tical problems such as computational fluid mechanics; 
see, Engelman, Strang and Bathe (1981). 

As discussed in Subsection 2.1, a least-squares solu- 
tion of an overdetermined problem (4) is a minimizer of 
the functional g(x) := F(x)" F(x), x € E. Then a criti- 
cal point of g is the solution of the system Dg(x)" := 
2DF(x)" F(x) = 0 and hence an application of Newton’s 
method involves the second derivative of F. This can be 
avoided by approximating g near the kth iterate by the 
quadratic functional 


g) = [F(") + DFO = xT Fa") 
+ DF(x*)(x — x) 
and then taking a minimizer of g, as the next iterate ge 
If DF(x*) has maximal rank, the global minimizer of g, 
is unique and the resulting method becomes 
xh) = xt [DFT DFE I DFO FG"), 
k=0,1,... (26) 
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This represents a linearization method for Dg(x)™ = 0 and 
is called the Gauss—-Newton method, Note that if m = 
n and DF(x*) e GL(n), then (26) reduces to Newton’s 
method for F(x) = 0 (but differs from Newton’s method 
applied to Dg(x)™ = 0). If DF(x*) cannot be guaranteed 
to have maximal rank at all iterates, then a widely accepted 
approach is to replace (26) by the Levenberg-Marquardt 
method 


xH = xt fy + DFG‘) DF@] | DF) Fa, 
k=0,1,... 


where the regularization factors a, > 0 are usually deter- 
mined adaptively. 


3.2 Local convergence of Newton’s method 


Most local convergence results for Newton’s method are 
restricted to simple zeroes of F. For continuously differ- 
entiable F, it can be shown that in a neighborhood of 
a simple zero x* of F, the iteration function G(x) := 
x — DF(x)~' F(x) of Newton’s method is well defined and 
that G has the F-derivative DG(x*) = 0 at x*. Hence, The- 
orems 3 and 4 provide the following result: 


Theorem 6. For a C! mapping F:E C R” — R", every 
simple zero x* in the (open) set E is a point of attraction 
of Newton’s method (25) with R (J, x*) = R (J; x*) = 0. 
Moreover, there exists ky > 0 such that 


FEDI < IPOD k2 ko 27) 
If, in addition, a Lipschitz condition 


IDF) -DFO <ylx -yl Yx, yeZ, y>0 
(28) 
holds, then OglI, x*) = Ool I, x*) 2 2. 


The Lipschitz condition (28) can be weakened. In fact, 
for the convergence at a given simple zero x* to be at least 
Q-quadratic, it suffices to assume that 


|DF(x) — DF(x*)I| < yllx — x* (29) 

in some neighborhood of x* in E. 
An important property of Newton’s method is its affine 
invariance. More specifically, the problem F(x) =0 is 


invariant under any affine transformation 


F —> F:=AF, A€GL(n) (30) 


and it is easily seen that this invariance is inherited by the 
Newton iterates. But neither the Lipschitz condition (28) 
nor its simplified form (29) share this affine invariance. 
This led P, Deufihard and G. Heindl in 1979 to replace 
(29) by the affine invariant condition 


|DF(Q*)"[DF@)- DF) < ylx -x 6D 


The affine invariance of Newton’s method is of consider- 
able importance in many situations (see Subsection 3.4). 

The following result of Ortega and Rheinboldt (2000) 
shows that the Newton iteration function is a contraction in 
some neighborhood of a simple zero, provided the Lipschitz 
condition (28) holds. 


Theorem 7. Assume that the C! mapping F:E CR" —> 
R” satisfies (28) and has a simple zero x* € E. Let B= 
|DF(*)“ | and p > 0 such that n := Byp < 2/3. If the 
closed ball B(x*, p) with radius p is contained in E, then 
the Newton iteration function satisfies 


n 
l-n 


1 
IG) —x*< 5 lx — x] Vx € BO", p) 
and hence maps B(x", p) into itself and is a coniraction for 
n < 2/3. 


There are other similar results in the literature. We 
mention only a convergence theorem for the chord Newton 
method 


x+! = Gx), k=0,1,..., 
G(x) := x — DF(x°) F (x) (32) 


which is analogous to the construction of the convergence 
ball (10) for a contraction: 


Theorem 8. Under the assumptions of Theorem 7 for 
the mapping F, let x? € E be a point where DF(x°) 
is invertible. Set B = ||DF(x°)-!|| and assume that n= 
Byl| DF (x°)-! F(x°)|| < 1/2 and that B(x°, p) C E for p = 
(1/By) [1 — V — 2n)]. Then the chord Newton iteration 
function G of (32) is a contraction on the ball B(x°, p) and 
maps this ball into itself. Hence, the chord Newton method 
(32) started at x? converges to the unique zero x* of F in 
B(x, p). 


Of course, the convergence follows from Theorem 2. 
The contraction constant of G on the ball turns out to 
be a=1—./(i—2n), which leads to the requirement 
n < 1/2. The proof also shows that x* is a simple zero 
of F. Theorem 8 is loosely related to the classical theorem 
for Newton’s method proved in 1948 by L.V. Kantorovich. 


As in the Kantorovich theorem, it can be shown that x* is 
actually a unique zero of F in some larger ball. 


3.3 Discretized Newton methods 


The execution of step (23) of the linearization method 
(22) involves (i) the evaluation of the n components of 
F(x*), (ii) the evaluation of the n? elements of B,, (iti) 
the numerical solution of the n x n linear system, and (iv) 
the work required in updating the memory set. Evidently, 
these tasks depend on the problem as well as the method. 
For instance, in high-dimensional problems, sparse-matrix 
techniques may be needed in (iii) or direct solvers have 
to be replaced by iterative methods. For Newton’s method, 
Gi) involves the evaluation of the n? first partial derivatives 
of F. Algebraic expressions for these partial derivatives 
are not always easily derivable and some automatic differ- 
entiation method may be required; see Griewank (2000). 
Alternatively, the partial derivatives may have to be approx- 
imated by appropriate finite differences. 

At a point x in the domain of F and for a suitable 
parameter vector k e R”, let J(x, h) € L(R") be a matrix 
with elements J(x, h),; that approximate the partial deriva- 
tives Of, (x)/dx;, i,j =1,...,”. For J (x, h) € GL(n), the 
resulting linearization process 


Hla xk gat nh) pst, k=0,1,..., he eR” 

(33) 
is called a discretized Newton method. A simple example 
for J({x, h) is 


J(x,h) € GL(n), 


1 ; 
Jæ, h) = z Me +h- fA G4) 
ij 


where e!,,..,e" are the natural basis vectors of R” and 
h € R” is a vector with small components h;, > 0. 

For an analysis of the process (33), we need to know 
how J(x, h) approximates DF (x). Minimally, the matrix 
J(x*, h*) should be sufficiently close to DF(x*) at each 
iterate x*. The form of J is generally fixed, but the vectors 
h* remain free. Evidently, we expect (33) to work only 
for specific choices of the h*. Many convergence theories 
simply assume that J(x, h) is defined for all x in the 
domain of F and for all A with sufficiently small norm, 
and that J (x, h) converges to D F(x) when h tends to zero. 
Certainly, for (34), this is readily proved, 

An overview of local convergence results for discretized 
Newton methods based on various approximation condi- 
tions for J is given by Ortega and Rheinboldt (2000). In 
essence, a simple zero of F is a point of attraction of 
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(33) if, uniformly at all iterates, the approximation error 
IJ x, h) — F(xc*)|| is sufficiently small. Moreover, the 
convergence is superlinear if the approximations tend to 
zero for k —> oo. In practice, it is rarely possible to deter- 
mine a priori estimates for the required approximation 
properties. 

For a given choice of steps h; ; > 0, the evaluation of 
the matrix (34) at a point x requires the function values 
f(z) and f,(x +h, jet), for i, j =1,...,n, which, in the 
terminology of Subsection 2.2, involves n +n? homers. 
Accordingly, not only the approximation properties of J but 
also its specific form have an effect on the overall behavior 
of the discretized Newton method. The count in terms 
of horners is often unrealistic. In fact, in many problems 
we only have a routine for evaluating the entire vector 
F(x) € R” at a given point x and the cost of computing 
a single component value f,(x) is almost as high as that 
of computing all of them. In that case, the above example 
is extraordinarily costly unless we restrict the choice of the 
steps h; j- For instance, if, say, hij = i fori=1,...,n, 
then in (34) the jth column of J(x, h) becomes 


HFa + hye!) ~ FO) 65) 
1 


and requires altogether n + 1 calls to the routine for F. For 
sparse Jacobians D F (x), there can be further savings in the 
evaluation of J (x, h). For instance, if, for i = 1,...,”, the 
component f; depends at most on x;_), x;, and x;,;, then 
the matrix J(x, 4) with the columns (35) can be generated 
by four evaluations of F for any dimension n > 3. An 
analysis of this approach and relevant algorithms were 
given by Coleman and Moré (1983). 


3.4 Damping strategies 


Since the boundaries between the attraction basins of iter- 
ative methods are expected to have a fractal nature, the 
convergence may become erratic unless started sufficiently 
near the desired solution. For linearization methods, some 
control of the convergence behavior can be gained by 
introducing damping factors. We consider here only the 
representative case of the damped Newton method 


xl xk = DFG) I FG, k=0,1,... (36) 


involving certain damping factors i, > 0. Analogous app- 
roaches can be used for other linearization methods. 
Evidently, for effective control of the iteration, the damp- 
ing factors should be chosen adaptively. Various strategies 
have been proposed for this. In the case of a gradient map- 
ping F(x) = Dg(x)', a popular approach is to construct 
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Ag so as to ensure for each step an acceptably large 
decrease g(x") — g(x**!) > 0 in the values of g. Typically, 
this involves a minimization of g along the line seg- 
ment x* — sp*, 0 < s < 1 from x* in the Newton direction 
p* = DF(x*)-1F(x*) coupled with a test for the accept- 
ability of the decrease of g. Such a test may require, for 
instance, that 


(xt — xp) < (1 — ada(x*) (37) 


with some constant a > 0, say, a = 1/2. A possible imple- 
mentation is here the so-called Armijo rule, where (37) 
is tested successively with decreasing > taken from the 
sequence {1,1/2,1/4,..., Mmin}- This damping strategy 
belongs to the topic of unconstrained minimization meth- 
ods. The literature in this area is extensive; see Rheisboldt 
(1998) for some references. 

In practice, F may not be a gradient mapping or the func- 
tional g is not readily computable. Then other functionals 
have to be constructed that are to decrease at each step. 
As noted in Subsection 2.4, Newton’s method is invari- 
ant under affine transformations (30). Moreover, the affine 
invariant Lipschitz condition (31) suggests the inverse of 
the Jacobian as a natural choice of the transformation 
matrix. This leads to the definition of the functional 


hy (x) = ELD FG") FOYT DFO FQ) 


which is to be decreased in the step from x* to x*+!, 
A damping strategy based on this idea was introduced 
and analyzed by Deufihard (1974). The following algo- 
rithm sketches this approach and uses the earlier mentioned 
Armijo rule to construct a damping factor: 


input: (x", dain} 
for} =1,4,4,-.-) Amin do 
solve DF (x*)p* = F(x*); 
evaluate F(x* — p“); 
solve DF (x*)g* = F(x* — xp"); 
if tat is small then return {‘convergence detected’}; 
if (g*)"g* < (1— 4) (p*)" p* then return (2, := 4}; 
endfor; 
return {‘no >, found’); (38) 


The actual implementation of the damping strategy given 
by Deuflhard (1974) is more sophisticated. In particular, 
the algorithm begins with an a priori estimate of a damping 
factor that often turns out to be acceptable; moreover, the 
convergence test takes account of potential pathological sit- 
uations and the loop uses a recursively computed sequence 
of X values. For further details and related convergence 
aspects, we refer to the original article. The resulting damp- 
ing strategy is incorporated in the family of Newton codes 


NLEQ1, NLEQ2, NLEQIS available from the Konrad Zuse 
Zentrum, Berlin. 


3.5 Inexact Newton methods 


At each step of a linearization method (22), the linear 
system B,(x —x*) = F(x*) has to be solved, For large 
dimensions, this may require the use of a secondary linear 
iterative process such as a successive overrelaxation (SOR) 
process, alternating direction iterations (ADI), or a Krylov 
method. We consider here Newton’s method as the primary 
process, in which case the linear system has the form 
DF(x*)s = F(x*). Since any secondary iteration provides 
only an approximate solution $*, it cannot be expected that 
at the next iterate x*+! = x* + 3* the residual 


rt = DFO! — xt) + FQ") (39) 


is zero. The norm ||F(x*)]| represents a measure of the 
residual of the primary process. As long as this primary 
residual is still relatively large, there is little reason for 
enforcing a very small |r*|j. Hence, the secondary pro- 
cess should be terminated adaptively on the basis of, 
for instance, the quotient ||r*|/ F (x*)|| of the secondary 
and primary residuals. Combined processes with Newton’s 
method as primary iteration and an adaptive contro} for 
the secondary method now carry usually the name inexact 
Newton methods given to them by Dembo, Eistenstat and 
Steihaug (1982). 

It turns out that the convergence of these combined pro- 
cesses can be ensured by requiring the primary residuals 
| F (x*)]| to decrease monotonically. (Recall that Theorem 6 
ensures this for Newton’s method when k is sufficiently 
large.) The theory can be based on a convergence result 
for certain sequences {x*} without taking account of their 
method of computation. More specifically, for a C 1 map- 
ping F:R" -> R” on R” and constants y, d € (0, 1), let 
{x*} c R” be a sequence such that 


WDE (hock! — xk) + FOI SIFO ve = 0 
(40) 
FEDI <x FHI Yk > 0 (41) 


If a subsequence of {x*} has a limit point x* where 
DF (x*) € GL(n), then it can be shown that the entire 
sequence converges to x* and that F(x") = 0. 

In order to apply this to a combination of Newton’s 
method and some linear iterative method, we have to guar- 
antee the validity of (40) and (41) at each step. This 
can be accomplished by a damping strategy, that is, a 
step reduction. At a given point x € R” where F(x) # 


0, and for any step s €R", consider the two control 
variables : 


_ FG) + DF@)sI is 
ro = — Fro | OTF 


If § is a step such that r($) < 1, then any reduced step 
s = 05, 0 € (0, 1), satisfies 


r(s) <A—-% +6 <1, o(s)= 80(8) (42) 


This can be used to show that by means of a step reduc- 
tion, both (40) and (41) can be satisfied if only (40) 
already holds for the initial step $. The folowing ‘mini- 
mum reduction algorithm given by Eisenstat and Walker 
(1994) is based on this idea. Tt involves the choice of suit- 
able parameters t € (0, 1), nmax 9 < Omin < Onax < 1, and 
Jax» Which are assumed to be supplied via the memory 
sets. A typical acceptance test for methods of this type uses 
the residual condition || F (x*)|| < tol with an appropriate 
tolerance. 


G: input: {k, x*, Mp} 
apply the secondary process to determine s € R” 
such that 
\F(*) + DF(x*)sil < nIF (eI for some 
n € ©, Nmax)’ 
j= 0; 
- while (JFE +9 > D- T0 — FID 
choose 8 € [nin max 
n:=(1—-0 +n; s:= Os; j:=j+41; 
if j > jma then return {‘fail’}; 
endwhile; 
yet] = xk 4s: 
return {x**!, My41} (43) 


The initial step s satisfies r(s) <n < 1, and since 1 — n is 
reduced by a factor © < Omar < 1 during each repetition of 
the while loop, it is expected that the condition of the loop 
will be achieved for appropriately chosen ®,i, and Jmax- If 
the algorithm (43) does not fail and the computed sequence 
{x*} has a subsequence that converges to a point x* where 
DF(x*) € GL(n), then it follows from the earlier indicated 
convergence result that the entire sequence converges to x* 
and that x* is a simple zero of F. 

The algorithm (43) requires a secondary iterative method 
for computing an approximate solution s of the linear 
system DF (x*)s + F(x*) = 0 such that r(s) < 1. Por this, 
the general minimum residual method (GMRES) has been 
widely used, although, in principle, any other linear iterative 
process can be applied. We refer to Kelley (1995) for an 
introduction to Krylov-type methods and GMRES. 
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Many variations of inexact Newton methods have been 
discussed in the literature. For example, Deuflhard and 
Weiser (1998) consider nonlinear elliptic partial differ- 
ential equations in a finite element setting and combine 
Newton’s method with a multigrid method as a secondary 
process for the approximation of the Newton corrections. 
For high-dimensional problems, parallel computations are 
indicated. For this high-latency, low-bandwidth workstation 
clusters are increasingly used. A combined process of New- 
ton—Krylov—Schwarz type has been implemented in 1995 
at ICASE, Hampton, VA, and used in various aerodynamic 
applications. 


4 PARAMETERIZED SYSTEMS 


Nonlinear equations in applications almost always involve 
several parameters. While some of them can be fixed, for 
many others, only a possible range is known. Then interest 
centers on detecting parameter configurations in which the 
solution behavior exhibits major changes as, for instance, 
where a mechanical structure starts buckling. 

Problems of this type require the incorporation of the 
changeable parameters into the specification of the equa- 
tions and hence lead to equations of the form 


F(iy,)=0, FECR := R” x R? > R”, 
n=m+d,d>0 (44) 


involving a state vector y € R” anda parameter vector ) € 
R2. For such systems, it is often convenient to disregard the 
splitting R” = R” x R4 by combining y and » into a single 
vector x € R" and writing (44) in the ‘underdetermined’ 
form 


F(x)=0, FECR" >R", n-m=d>0 (45) 


For equations of the form (45) (or (44)), it rarely makes 
sense to focus only on the computation of a specific 
solution. Instead, the aim is to analyze the properties of 
relevant parts of the solution set of the system. 


4.1 Homotopies and piecewise linear methods 


We begin with a class of techniques, the so-called homotopy 
methods, in which a parameter is not intrinsic to the 
problem but is introduced as an aid to the analysis and 
the computation. 

Two continuous mappings F;: E C R" > R", i = dds 
are homotopic if there exists a continuous mapping 
H: Ey, CR" xR — R” such that E x [0,1] C Ey and 
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A(y,0) = RO) and Ay, =F,(y) for all ye£. 
A principal question of homotopy theory concerns the 
preservation of the solvability properties of the mappings 
y € E> A(y,) when à changes from 0 to 1. Such 
preservation information represents a powerful tool for the 
establishment of existence results and the development of 
computational methods. 

Suppose that a zero y* of F} is to be found. Since, in 
practice, iterative solvers can be guaranteed to converge 
to y* only if they are started sufficiently near that point, 
techniques are desired that ‘globalize’ the process. For 
this, let H be a homotopy H that ‘connects’ F, to a 
mapping Fy for which a zero y° is already known. If 
the homotopy preserves the solvability properties, then we 
expect the solutions y = y(\) of the intermediate systems 
H(y,) =0 to form a path from y? to y*. The idea of 
the homotopy methods is to reach y* by following this 
path computationally. Subsections 4.3 and 4.4 address some 
methods for approximating such solution paths. 

Homotopies arise naturally in connection with simplicial 
approximations of mappings. A basic result of algebraic 
topology states that a continuous mapping F between cer- 


tain subsets of finite-dimensional spaces can be approxi-. 


mated arbitrarily closely by a simplicial mapping that is 
homotopic to F. The details of this result are outside 
the frame of this presentation. In the past decades, these 
concepts and results have been transformed into effective 
computational algorithms for approximating homotopies by 
simplicial maps on simplicial complexes and their subdi- 
visions. They are now generally called piecewise linear 
methods. The first computational algorithms utilizing sim- 
plicial approximations were given in 1964 by C.E. Lemke 
and J.T. Howson, and addressed the numerical solution 
of linear complementarity problems. In 1967, H.E. Scarf 
introduced algorithms for approximating fixed points of 
continuous mappings that were later shown to belong to 
this class of methods as well. Since then, the literature on 
piecewise linear methods has grown rapidly. 

Applications of the piecewise linear methods, besides 
those already mentioned, include the solution of economic 
equilibrium problems, certain integer programming prob- 
lems, the computation of stationary points of optimization 
problems on polytopes, and the approximate solution of 
continuation problems. The methods also provide a compu- 
tationally implementable proof of the Brouwer fixed-point 
theorem mentioned in Subsection 2.1, A survey of the area 
with extensive references was recently given by Allgower 
and Georg (2000). 

The piecewise linear methods do not require smooth 
mappings and hence have a theoretically broad range of 
applicability, In fact, they have also been extended to the 
computation of fixed points of set-valued mappings. But 


usually, these methods are considered to be less efficient 
when more detailed information about the structure and 
smoothness of F permits the utilization of other techniques. 
This appears to be one of the reasons these methods have 
not found extensive use in computational mechanics. 


4.2 Manifolds 


Most mechanical applications leading to parameterized 
equations (44) involve mappings F that are known to 
belong to some smoothness class C", r > 1. Then typi- 
cally, the solution set M := F~!(0) has the structure of a 
differentiable submanifold of R”. In mechanical equilibrium 
studies, this is often reflected by the use of the term equi- 
librium surface, although a mathematical characterization 
of the manifold structure of M is rarely provided. 

We summarize briefly some relevant definitions and 
results about submanifolds of R” and refer for details to the 
standard textbooks. A C! mapping (1) on the (open) set E is 
an immersion or submersion at x? € E if DF(x°) € L(R", 
R”) is a one-to-one mapping or a mapping onto R” respec- 
tively. The mapping F is an immersion or submersion on 
a subset S C E if it has that property at each point of S. 
With this, submanifolds of R” can be defined as follows: 


Definition 2. A subset M CR" is a d-dimensional C" 
submanifold of R", r > 1, if M is nonempty and for every 
Xo E M there exists an open neighborhood U C R" of x 
and a submersion F:U —> R" of class C" such that MN 
U= FO := {x el: F(x) =0}. 


The following special case is frequently used: 


Theorem 9. Let F:E CR" > R",n—m=d > 0, be of 
class C", r > 1, on an open set E and a submersion on 
its zero set M := F-'(0). Then M is either empty or a 
d-dimensional C" submanifold of R”. 


For the analysis of submanifolds, local parameterizations 
are needed. 


Definition 3. A local d-dimensional C” parameterization 
of a nonempty set M C R" is a pair (U, 9) consisting of 
an open set UCR? and a C" mapping ~:U-> R" such 
that ọ(U) is an open subset of M (under the induced 
topology of R"), and ọ is an immersion on U that maps 
U homeomorphically onto oU). The pair (U, ọ) is called a 
local parameterization near the point x if x € MN oU). 


A nonempty subset M C R” is a d-dimensional C” 
submanifold of R” if and only if M has a d-dimensional 
C” local parameterization near each of its points. 

Let M be a d-dimensional C” submanifold of R” and 
suppose that at a point x € M the pair (U, F) is as 


stated in Definition 2, Then the d-dimensional subspace 
ker DF (x) := {h € R": DF(x)h = 0} is independent of the 
specific choice of the local submersion F and depends 
only on M and the particular point. This linear space is 
the ‘tangent space’ of M at x and is denoted by T,M. 
The subset TM := Uyem [{x} x T,M] of R” x R” is the 
‘tangent bundle’ of M. 

Every open subset E CR” is an n-dimensional C° 
submanifold of R” that has TE = E x R” as its tangent 
bundle. In particular, the tangent bundle of R” itself is 
TR" = R” x R”. Thus, the tangent bundle of a submanifold 
M of R” appears as a subset of the tangent bundle TR” 
and is itself a submanifold of TR” if F is sufficiently 
smooth, In fact, if M is a d-dimensional C” submanifold 
of R” with r>2, then TM is a 2d-dimensional C’~! 
submanifold of TR". Moreover, if (x, v) € TM and (U, ¢) 
is a local C” parameterization of M near x, then the pair 
(U x R4, (p, Dg)) is a local C’~! parameterization of TM 
near (x, v). 

The computation of local parameterizations of a manifold 
M utilizes the following concept: 


Definition 4. A d-dimensional linear subspace T of R" is 
a local coordinate space at the point x € M if 


T,MNT = {0} (46) 


If (46) fails to hold, then x is a foldpoint of M with respect 
toT. 


Evidently, at x € M, the tangent space T = T,M is an 
obvious choice of a local coordinate space. The canonical 
inner product of R” induces on M a Riemannian structure 
and it makes sense to introduce the normal spaces 


N,M:=T,Mt YxeM (47) 
e 
Then (46) can be written as N, M NT = {0}. 

The computation of a local parameterization on a sub- 
manifold is a local process and hence it is no restriction 
to phrase the following result in terms of the zero set of a 
single submersion: 


Theorem 10. With the assumptions of Theorem 9, let T C 
R” be a coordinate subspace of M at x° € M and V an 
n xd matrix such that the columns form an orthonormal 
basis of T. Then the C” mapping 


(48) 


is a local diffeomorphism from an open neighborhood of 
x° € M onto an open neighborhood U of the origin of R4 
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and the pair (U, ọ) defined with 
d -1{9 
@UcRi>M, oly) :=H Cy yeu 


is a local C” parameterization of M near x°. 


Thus, the evaluation of x = y(y) for y € U requires the 
solution of a nonlinear system of equations. By (46), the 
derivative DH (x°) is invertible and experience has shown 
that a chord Newton method with this derivative works well 
in practice. The process can be applied in the form 


k 
ah} see xk — DH (x)! ee J , xP =x 4 Vy 
(49) 
where the y-dependence occurs only at the initial point. 
A possible implementation is given by the following 
algorithm: 


input {x°, y, V, DF (x°), tolerances} 

x rex + Vy; 

compute the LU factorization of DH (x°); 
while {‘iterates do not meet tolerances’} 


evaluate F (x); 
solve DH(x°)w = ig a ; 
Xi=X-W; 
endwhile; 
return (p(y) := x} (50) 


In order to meet the condition (46) for the coordinate space 
T, it is useful to begin with a complementary subspace 
S C R” of the tangent space TM and to set T = St, 
If z',...,2” is a basis of S, then T is the nullspace of 
the m xn mauix Z of rank m with these vectors as its 
rows. A well-known approach for a nulispace computation 
is based on the LQ-factorization (with row pivoting) Z = 
PT(L 0)Q". Here, P is an m x m permutation matrix, L 
an m xm nonsingular lower-triangular matrix, and Q = 
(Q; Q,) an n xn orthogonal matrix partitioned into an 
nxm matrix Q, and an n xd matrix Q,. Then the d 
columns of Q, form the desired orthonormal basis of T. 
This justifies the following algorithm: 


input {Z} 

compute LQ factorization of Z using row pivoting; 
for j =1.2,...,d do wis Qet; 

return {U := (u,...,u4)} (51) 


Other algorithms for the computation of nullspace-bases of 
m x n matrices have been discussed in the literature; see, 
for example, Rheinboldt (1998) for references. 
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Instead of choosing the subspace S, we construct the 
basis matrix Z directly from the derivative DF (x°). For 
example, if the coordinate space T is to contain, say, the ith 
canonical basis vector e! of R”, then Z is formed by zeroing 
the ith column of D F(x‘) provided, of course, the resulting 
matrix still has rank m so that (46) holds. Obviously, when 
the tangential coordinate system is used at x°, then (51) can 
be applied directly to DF (x‘°) as the matrix Z. 

These algorithms represent a small part of the 
MANPACK package of FORTRAN programs developed 
by Rheinboldt (1996), which is available on netlib.org. 


4.3 Continuation by differentiation 


Continuation methods concern problems of the form (45) 
with d = 1, that is, systems 


F(x)=0, F:ECR"->R", n=m+1 (52) 


defined by a C", r > 1, mapping F on an (open) set £. If 
F is a submersion on its solution set M := F7!(0), then 
M is a one-dimensional submanifold of R”. This manifold 
may have several connected components, each of which is 
known to be diffeomorphic either to the unit circle in R? 
or to some interval on the real line, that is, to a connected 
subset of R that is not a point, 

Thus, continuation methods can be viewed as meth- 
ods for approximating connected components of one- 
dimensional manifolds defined by a nonlinear system (52). 
There are several different approaches for designing such 
methods as is reflected by some of the alternate names that 
have been used including, for instance, embedding meth- 
ods, homotopy methods, parameter variation methods, and 
incremental methods. In essence, there are three principal 
classes of methods, namely, (i) piecewise linear algorithms, 
(ii) continuation by differentiation, and (iii) methods using 
local parameterizations. 

Piecewise linear continuation algorithms have been deve- 
loped for computing a simplicial complex of dimension 
n that encloses the manifold M. We refer to the survey 
by Allgower and Georg (1997) for details. Since the sim- 
plices of the complex have the dimension n of the ambient 
space, the methods are best suited for problems in relatively 
low-dimensional spaces and hence have found only limited 
application in computational mechanics. 

The methods in class (iii) include, in particular, the 
continuation methods used extensively in computational 
mechanics and will be the topic of Subsection 4.4. 

For the methods in class (ii), consider first the case in 
which the solution set of (52) can be parameterized in terns 


of à. In other words, assume that there is a C! mapping 
n: J — E on an interval 7 such that 


FMO) =O Ves (53) 


Let y? = 900°), 0° € J be a given point. Then y = n(A) 
is on J a solution of the initial value problem 


D,F(y, Ny + D FY.) =0, yO” =y? (54 


Conversely, for any solution y = n(A) of (54) on an interval 
J such that .° € J and F(y®, 2°) = 0, the integral mean 
value theorem implies (53). 

In a lengthy series of papers during a decade starting 
about 1952, D. Davidenko showed that a variety of nonlin- 
ear problems can be solved by embedding them in a suitable 
parameterized family of problems and then integrating the 
corresponding ODE (54) numerically. He applied these 
‘embedding methods’ not only to nonlinear equations but 
also to integral equations, matrix inversions, determinant 
evaluations, and matrix eigenvalue problems; see Ortega 
and Rheinboldt (2000) for some references. In view of this 
work, the ODE (54) has occasionally been called the ‘Davi- 
denko equation’. 

If D,F(y, >) is nonsingular for all (y, à) € E, then 
classical ODE theory ensures the existence of solutions of 
(54) for every (y, x9) € E. But if DFO, X) is singular 
at certain points of the domain, then (54) is an implicit 
ODE near such points and the standard theory no longer 
applies. This was never addressed by Davidenko. But the 
difficulty can be circumvented by dropping the assumption 
that the solution set can be parameterized in terms of i. 
More specifically, the following result holds: 


Theorem 11. Suppose that the C? mapping F: E C R” > 
R"~! is a submersion on E. Then for each x € E there exists 
a unique vector u, € R” such that 


DF(@)u, =0, lulh = 1. dee ( PEP?) >0 (55) 


Moreover, the mapping Y: E C R” — R” defined by x € 
E œ> W(x) := u, is locally Lipschitz continuous on E. 


Note that F is required to be a submersion on its domain 
E and not merely on its zero set. This is not a severe 
restriction and can be ensured by shrinking E if needed. 
By the local Lipschitz continuity of the mapping Y, ODE 
theory guarantees that for every x° € E the autonomous 
initial value problem 


dey A 
TYO 0s EE (56) 


has a unique solution x: 7 E on an open interval Va 
with 0 € J This solution can be extended'to an interval 
J that is maximal with respect to set inclusion. Then, at 
a finite boundary value s € 3 J we have either x(t) > 3E 
or ||x(t)|lz > CO as T> 5, TE J. A solution x = x(t) 
of (56) satisfies DF(x(x))x(t) = DF (x(t) ¥@@) = 0 
whence F(x(t)) = F(x?) for t € J and therefore x(t) € 
F~1(0) provided x’ was chosen such that F(x°) = 0. 

These results show that standard ODE solvers can be 
applied to (56) for computing connected segments of 
F71(0). For this, a routine is needed for evaluating Wx) 
at a given point x € E. A typical approach is to calculate 
a nonzero vector u € ker DF (x), which is then normalized 
to Euclidean length 1 and oriented by multiplication with a 
suitable o = +1. The direct implementation of the determi- 
nant condition in (55) can be avoided by choosing o such 
that ow W(%) > 0 where Y(%) is the computed vector at 
some ‘earlier’ point £. 

During the numerical integration of (56) starting from 
x? e F-1(0), the condition F(x) =0 is not explicitly 
enforced and the computed points may drift away from 
F~1(0), Thus, the ODE approach requires further correc- 
tions if the aim is to generate a good approximation of the 
manifold. However, for the homotopy methods introduced 
in Subsection 4.1, the drift is entirely acceptable. In fact, 
in the setting of homotopies, interest centers on computing 
a solution y* of the terminal system F; (y) = 0 and not on 
approximating a solution path in the zero set of H. Assume 
that Theorem 11 applies to H. If for given y? € Fy 1(0) the 
numerical solution of the initial value problem (56) for H 
reaches the plane L := {(y,4):4 = 1} C R” x Rata point 
G1) € L, then } is expected to be near y*. Therefore, 
a standard iterative process for solving F,(y) = 0 should 
converge to y* if started from 5. 

The condition of reaching L can be deduced from the 
fact that a maximally extended solution of the initial value 
problem must leave any compact subset of the domain 
Ey of H. Let Fo be chosen such that the solution y® 
of Fy(y) = 0 is unique. In addition, assume that the ODE 
solution starting at y? remains bounded and is contained, 
say, in a cylinder C := {(y, X): ly — yl <8, O< 2 <I} 
Then, for C C Ey, it follows that the path has to reach the 
end CN L of the cylinder. For a large enough domain Ey, 
the boundedness of the path is related to the boundedness of 
the zero set of H. Thus, a central question in the homotopy 
approach sketched in Subsection 4.1 is the selection of a 
homotopy that satisfies Theorem 11. 

It was shown by Chow, Mallet-Paret and Yorke (1978) 
that there are homotopy methods for finding zeros of a 
nonlinear mapping, which are constructive with probability 

1. The theory is based on a parameterized Sard theorem. 
For this, a family A:R? x R” x R — R” of homotopies is 
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considered that depends on a parameter p € R?. (Typically, 
the additional parameter is the initial point.) Let H be 
sufficiently smooth and a submersion on its zero set. Then 
the theorem states that for almost all .°, the homotopy 
Ayo (y, X) = a (u, y, 4) is a submersion on its zero set. 
As indicated above, it can then be concluded that for almost 
all į, and hence with probability 1, the solution path of the 
homotopy H, is expected to reach the plane L. 

This approach has been the topic of a growing literature 
on what are now called probability-one homotopy methods 
for solving nonlinear systems. Algorithms for many appli- 
cations have been proposed, including, especially, for the 
solution of specific classes of equations, such as polyno- 
mial systems and unconstrained minimization problems. A 
widely used implementation is the software package HOM- 
PACK; see, Watson et al. (1997) for the latest FORTRAN 
90 version, and further references. 


4.4 Local parameterization continuation 


Consider an equation of the form (52) defined by a C” 
mapping (r > 1) that is a submersion on its zero zet M := 
F71(0). We discuss now continuation methods that utilize 
local parameterizations of the one-dimensional submanifold 
M of R" to produce a sequence of points x*, k = 0,1,.... 
on or near M starting with a given x° € M., Typically, 
for the step from x* to x*+!, a local parameterization 
of M near x* is constructed and a predicted point is 
determined from which an iterative process, such as (50), 
converges to an acceptable next point x*t! € M. A local 
parameterization can be retained over several steps. Of 
course, this requires a decision algorithm, which may be 
based, for example, on the rate of convergence of the 
iteration at earlier points. 

The literature on the design, implementation, and appli- 
cation of these continuation methods is huge and cannot be 
covered here. A survey from a numerical analysis viewpoint 
with references until about 1997 was given by Allgower 
and Georg (1997). Equally extensive, and largely indepen- 
dent, is the literature on continuation methods in engineer- 
ing. For this, see in particular Chapter 4, Volume 2 by E. 
Riks in this encyclopedia, where path-following methods 
and Joad control for engineering applications are treated in 
detail. 

In line with our overall presentation, we address here only 
some general mathematical ideas and approaches underly- 
ing this class of continuation methods. 

For the construction of a local parameterization at xk we 
require a nonzero vector t* e R” such that (46) holds for 
T := span t*, This requires that * ¢ rge DF (x), that is, 
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that r* is not orthogonal to the tangent space TM. Two 
frequent choices are 


(a) tk: tuk, DF(x*)uk =0, |lu*l] = 1, 
tangent vector 

(b) t := +e’, e ENUM, 
basis vector of R” 


(57) 


Processes involving (a) are often called pseudo-arclength 
continuation methods. For the computation of a tangent 
vector, the algorithm (51) can be used with Z = DF(x*). 
Alternatively, with a vector w € R” such that DF(x*)w # 
0, an (unnormalized) tangent vector & is obtained by solving 
the augmented system 


k 
(PE jane (58) 


w 


The vector w should be reasonably parallel to the nullspace 
of DF(x*). Often, a canonical basis vector e' of R” is 
chosen such that the matrix obtained by deleting the ith 
column of DF (x*) is nonsingular. 

For any choice of the basis vector t* of the local param- 
eterization, the orientation has to be determined appropri- 
ately. A simple and effective approach is to orient ¢* for 
k > 0 by a comparison with the direction of the vector r*~! 
at the previous point x*~!, This presumes that the direction 
of r° is given. In the case of (57(a)) it is also possible to 
apply the orientation definition of (55), which, because of 


det ‘er ) = [uw] det (a ) 


(CAM 


can be replaced by a condition on the determinant of the 
matrix of the system (58). 

Once ¢* has been chosen, the local parameterization algo- 
rithm (50) requires the solution of the augmented nonlinear 
system 


F(x) 7 
(wre — x) = ae 62) 


for a given local coordinate y € R”. In principle, any one 
of the iterative solvers for n x n nonlinear systems can be 
applied. Besides the chord Newton method utilized in (50), 
most common are Newton’s method, discretized Newton 
methods, and the Gauss—Newton method. In each case, 
some form of damping may also be introduced. The selec- 
tion of the iterative process is of critical importance since 
it constitutes a major part of the computational cost, espe- 
cially for large sparse problems. In that case, combined pro- 
cesses for solving the augmented system are often applied, 
such as inexact Newton methods with a linear solver of 
Krylov type or some other fast algorithm appropriate for 


the problem. Certain linear solvers, such as direct factor- 
ization methods, allow for an inexpensive computation of 
the determinant of the matrix of (58), which, as noted, can 
be used for orienting the tangent vector. This is not the case 
for other solvers, such as the Krylov-type methods. 

In (50), the iteration starts from a ‘predicted point’ x* + 
htt. In many continuation codes, other linear predictions 
x* + h,v* with suitable vectors v* are used. In particular, 
vk is often taken to be the (normalized and oriented) 
tangent vector at x*, even if r* is not the tangent vector. 
This compares with the Euler method for approximating 
solutions of ODEs, and hence has been called the Euler 
predictor. For the selection of the step h, > 0 in these linear 
predictions, a number of algorithms have been proposed. 
Some of them are modeled on techniques used in ODE 
solvers, but there are also other approaches. For instance, 
step selections have been based on information collected in 
connection with the use of some sufficient decrease criteria 
in damped Newton methods (see Subsection 3.4). Other 
algorithms involve the estimation of curvature properties 
of M near x* to gain information about the prediction 
errors; see, for example, Burkardt and Rheinboldt (1983). 
Besides linear predictors, extrapolatory predictor algorithms 
have also been considered. Of course, this requires suitable 
startup techniques for use at the points where too few earlier 
points are known. 

By their definition, all these continuation methods are 
intended for the approximation of a connected component 
of the solution manifold M, But M may well have several 
components, and near certain singularities two components 
can be close to each other. In such cases, the continua- 
tion process may jump unnoticeably from one to the other 
component. Frequently (but not always), two such compo- 
nents may have opposite orientations, and then the specific 
algorithm for orienting the basis vector t* of the local 
parameterization becomes critical. For instance, if the deter- 
minant condition of (55) is used, then the jump between the 
components shows up as a reversal of the direction of the 
computed path. This is not the case, for example, if the vec- 
tor r* is oriented by a comparison with the direction of 1*7}, 
Such different behavior of two orientation algorithms can 
be an advantageous tool for the detection of certain types of 
singular points of parameterized mappings (see Section 5). 


4.5 Approximation of higher-dimensional 
manifolds 


Suppose that the mapping F satisfies the conditions of 
Theorem 9 with d > 2. Then, continuation methods can 
be applied for the computation of different paths on the 
d-dimensional manifold M = F-'(0). But it is not easy 
to develop a good picture of a multidimensional manifold 
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solely from information along some paths on it. In recent 
years, this has led to the development of methods for a 
direct approximation of implicitly defined manifolds of 
dimension exceeding 1. For this, a natural approach is 
the computation of a simplicial approximation of specified 
subsets My C M, that is, the construction of a simplicial 
complex of dimension d in R” with vertices on M such 
that the points of the carrier approximate Mp. 

The computation of such triangulations is also required 
in computer aided geometric design CAGD and related 
applications, But there the task differs considerably from 
that encountered here. In fact, in CAGD, the manifolds are 
typically defined as the image sets of explicitly specified 
parameterizations. On the basis of such parameterizations, 
various ‘triangulation’ methods have been proposed in the 
literature analogous to those for triangulating domains in 
linear spaces, for example, in finite element methods. 

For manifolds that are defined implicitly as the solution 
set of nonlinear equations, triangulations have been devel- 
oped only fairly recently. The earliest work (see Allgower 
and Georg, 1990) involves the use of piecewise linear meth- 
ods for the construction of an n-dimensional simplicial 
complex in the ambient space R” that encloses the d- 
dimensional manifold. The barycenters of appropriate faces 
of the enclosing simplices are then chosen to define a piece- 
wise linear approximation of the manifold itself. But, in 
general, the resulting vertices do not lie on the manifold, 
and, as with other piecewise linear methods, the approach is 
limited to problems in relatively low-dimensional ambient 
spaces. 

A method for the direct computation of a d-dimensional 
simplicial complex that approximates an implicitly defined 
manifold M in a neighborhood of a point was first given 
by Rheinboldt (1988). It is based on algorithms that were 
later incorporated in the MANPACK package of Rheinboldt 
(1996). By means of smoothly varying local parameter- 
izations (moving frames), standardized patches of trian- 
gulations of the tangent spaces of M are projected onto 
the manifold. The method is applicable to manifolds of 
dimension d >2 but was used mainly for d =2. Var- 
ious extensions and modifications of the approach have 
been proposed. In particular, implementations for d = 2, 3 
have been developed for computing d-dimensional simpli- 
cial complexes that approximate specified domains of the 
manifold; see Rheinboldt (1998) for references. 

A different method was introduced in 1995 by R. 
Melville and S. Mackey. It does not involve the construction 
of a simplicial complex on an implicitly defined two- 
dimensional manifold; instead the manifold is tessellated by 
a complex of nonoverlapping cells with piecewise curved 
boundaries that are constructed by tracing fish-scale pat- 
terns of one-dimensional paths on the manifold. The method 


appears to be intrinsically restricted to two-dimensional 
manifolds, Recently, Henderson (2002) developed another 
approach, which represents a manifold of dimension d > 2 
as a set of overlapping d-dimensional balls and expresses 
the boundary of the union of the balls in terms of a set of 
finite, convex polyhedra. 


4.6 Sensitivity 


For parameter-dependent problems of the form (44), there 
is not only interest in computing parts of the solution set 
but also in determining the sensitivity of the solutions to 
changes of the parameters. Generally, in the literature, the 
sensitivity of (44) is defined only near a solution (y®, X9) € 
F-1(0) where the state y depends smoothly on the parame- 
ter vector i. More specifically, assume that D,F O, eE 
GL(n), and therefore, that the implicit function theorem 
applies. Then there exists a C? mapping n:U —> E ona 
neighborhood U C R? of 2° such that each point (y, \) € 
F-!(0) in a certain neighborhood of (y°, X°) is uniquely 
specified by (n(), }) for some 2 € U4. With this, the sen- 
sitivity of F at (y°, A?) is defined as the derivative Dy (A?) 
and hence is the unique solution of the linear system 


Dy F(y®, DQ") = —D, FO", »°) (60) 


This corresponds, of course, to the equation (54) in the set- 
ting of Davidenko’s embedding methods. If, in practice, the 
partial derivatives D,F and D, F are not accessible, finite 
difference approximations of these derivatives can be intro- 
duced instead. But this calls for estimates of the influence 
of the approximation errors on the desired solution of (60). 
Alternatively, in order to determine the sensitivity Dn(\°) 
in the direction of a parameter vector p, approximations 
of Dn(.°)p can be computed by numerical differentiation 
based on values of n()? + tp) for several values of the 
scalar + near t = 0. Evidently, this sensitivity definition 
does not reflect any of the underlying geometric aspects 
of the problem and the indicated approximations utilize 
little information about the prior computation of (y°, 0°). 
Accordingly, a sensitivity analysis is often considered to be 
a ‘postprocessing’ technique that is applied independently 
of the way the solution (y°, °) was found. 

For the case in which the solution set M := F7!(0) is a 
d-dimensional submanifold of R”, a geometric interpreta- 
tion of the sensitivity concept was introduced by Rheinboldt 
(1993). In particular, it was shown that a sensitivity anal- 
ysis can be incorporated effectively in the solution process 
without an undue increase of the computational cost. 

Since the sensitivity concept is local in nature, it is once 
again no restriction to consider only the zero set of a single 
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submersion and hence to assume that F: E C R” > R”, 
n—~m=d>0, satisfies the conditions of Theorem 9. 
Let T CR" be a coordinate space at x € M and V an 
n xd matrix for which the columns form an orthonormal 
basis of T. In addition, suppose that W is an n xm 
matrix with orthonormal columns that span the orthogonal 
complement T+ of T. In other words, (V, W) is ann xn 
orthonormal matrix to be called a local basis matrix of the 
parameterization. 

Let (U, p) be a local parameterization of M induced 
by the coordinate space T. In terms of the splitting R” = 
T ® T+, the mapping ¢ can be written as 


yeu oy) = x+Vy¥4+ Why), 
PO) = WT (p0) — x) (61) 


The derivative D(0)z represents the ‘change’ of the solu- 
tion x in the local coordmeie direction z € R?. Accordingly, 
it is natural to define the derivative of the ‘corrector func- 
tion’ 4:4 —> R”, that is, the linear mapping 


He LRR"), Us Dy(0) (62) 


as the sensitivity mapping at x € M with respect to the 
local basis matrix (V, W). Hence, the image vector Xz € 
R" is the sensitivity of the solution x in the direction z. By 
definition of the local parameterization, it follows that 


(DF(x)W) E+ DF(x)V =0 (63) 


where, because of (46), the m x m matrix DF(x)W is 
invertible, 

In parameterized problems (44), there is a natural split- 
ting R” = Y © A of R” into the m-dimensional state space 
Y and the d-dimensional parameter space A. Typically, 
in terms of the canonical basis of R”, we have Y = 
span (e},...,¢) and A = span (e”*!, ..., e”). Hence, at 
points where A can be used as coordinate space, (63) is 
exactly (60). In other words, the definition (63) of X repre- 
sents a generalization of the classical sensitivity definition. 

At each x € M, the tangent space can be used as local 
coordinate space. Let the columns of the n x d matrix U 
form an orthonormal basis of T,M. Then, together with 
(63) it follows that 


(VTU) £ = WTU (64) 


and hence that © can be expressed as the solution of a 
d-dimensional linear system. Since, in practice, m is large 
while d is relatively small, the computational cost of solving 
(63) generally exceeds that of (64). In particular, if the 
natural parameter space A = span (e”+!,..,e") is used 


as coordinate space, then (64) involves only the matrices 
formed with the first m and last d rows of U. In this 
connection, it is worthwhile to observe that (64) does not 
depend on the orthogonality property UTU = I but only 
on the fact that the columns of U span the tangent space 
TM. i 

The distance between any two equidimensional linear 
subspaces $; and S, of R” is defined by dist (S,, S) = 
IP; — Palla, where P, is the orthogonal projection onto 
Sı, i = 1, 2. It is well known that dist (S,, S,) = sin © nax 
where ©,,,, is the largest principal angle between the two 
subspaces; see, for example, Golub and van Loan (1989; 
p. 24). Hence, dist (S,,S,) € [0, 1] and the bounds 0 and 
1 occur when the subspaces are parallel and orthogonal 
respectively, This distance concept provides for an interest- 
ing geometrical interpretation of the generalized sensitivity 
definition. In fact, it turns out that the Euclidean norm of 
x at x€ M depends only on the distance between the 
local coordinate space T and the tangent space T,.M and 
is given by 


Illz = =t Omax ô= dist (T,T,M) (65) 


= 
o% 

7 

o| 


For the computation, note that the largest principal 
angle Omay between T and T,M equals the maximal 
singular value of the dxd matrix VTU in (64); see 
again, Rheinboldt (1993). From (65), it follows that 
|| =], = 0 whenever the tangent space T, M is used as the 
local coordinate space at x. The norm of X tends to infinity 
the closer T comes to being orthogonal to the tangent space. 
Definition 4 implies that || X ||, = oo occurs precisely at the 
foldpoints x of M with respect to T., Thus, large values of 
the norm of X indicate the vicinity of foldpoints. 

As an example, consider the two-point boundary value 
problem 


x 
lt res i Z 
upu = ( wexp (a ale 
u(0) = u(1) = 0, a = 12log 10 


modeling an exothermic chemical reaction with convective 
transport and involving two parameters v, X. For a uniform 
mesh with step h = 0.1, the upwind discretization 


— (1+ vh)xj y + (2+ vh)x; — 2,4; =n? —x,) 
` ; 

xep(e- zi). i=1,...,k, xy =x =0 

defines a two-dimensional submanifold of R?! x R?. By 

means of the algorithm of Rheinboldt (1988), a simpli- 

cial approximation of this manifold was computed in a 


neighborhood of the point x = (y, >, v) defined by 


yı = 4.6067(-3), y1=9.7851(—3), y; =1.5694(-2), 
Yq =2.2570(—2), ys=3.0790(—2), yę=4.1012(—2), 
yy = 5.4582(—2), yg=7.5449(—2), y= 1.3374(-1), 
h = 23.907, v = 999.978 


Figure 2 shows a schematic representation of a part of 
the triangulation patch with computed values of ||X ||, at 
the nodes, The larger values along one of the diagonals 
indicate a nearby line of foldpoints with respect to the 
parameter space. This was confirmed by means of one of the 
algorithms for the local computation of foldpoints discussed 
in Section 5. 

This illustrates that the sensitivity norms represent an 
excellent indicator for the existence of nearby foldpoints, 
which can be monitored effectively during the computation 
of points on the manifold. 


5 BIFURCATION 
5.1 Characterization 


Bifurcation theory concerns, in essence, the study of equa- 
tions with multiple solutions. It is closely related to sin- 
gularity theory, that is, to the study of the local structure 
of nonlinear mappings from R” into itself. The literature 
on bifurcation theory is extensive and only a very short 
introduction can be given here (see Chapter 4, Volume 2). 

As in Section 4, consider a parameterized nonlinear sys- 
tem (44) with d > 1 parameters defined by a C” mapping 
F:E c R" := R” x R? > R”, n=m +d, r>1. Then, 
conceptually, a ‘bifurcation point’ is a point of F—'(0) 
where the number of solutions changes. More specifically, 
we consider two classes of ‘singular points’. 

First, let F be a submersion on E. Then M = F~!(0) 
is a d-dimensional submanifold of R”, and by Definition 
46 a point x € M is a foldpoint of M with respect to the 


Figure 2. |||, values for exothermic reaction. 
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parameter subspace A = {0} x R? C R” if 
spi= dim Ny>0, Noi=N,MNA (66) 


The integer s, is the first singularity index of the foldpoint. 
For example, the mapping (y, 4) € R? > \— y ER is a 
submersion on all of R? and its solution manifold M is a 
parabola. The origin of R? is a foldpoint with respect to the 
parameter ) with first singularity index s, = 1. Henceforth, 
foldpoints with respect to the parameter space will simply 
be called foldpoints. 

For the definition of the second class of singular points, 
we drop the assumption that F is a submersion at all points 
of F~!(0) and call x* € F71(0) a bifurcation point if F 
fails to be a submersion at x*, that is, if rank DF(x*) < m. 
For a closer analysis, this definition is too general and 
further conditions are needed that cannot be elaborated here. 

A classical bifurcation example is the ‘pitchfork’ 
mapping 


F:RxR>R, FN :=y -y 
YV, AJER XR (67) 


which is a submersion at all points of the punctured plane 
R? \ {0}. The origin is a bifurcation point where four 
solutions branches meet, namely, (i) {(0,s):s < 0}, (i) 
{(O, s): s > 0}, Gii) ((s, 52): 5 > 0}, and (iv) {(s, 82): 5 < 0}. 
Thus, for à < 0 there is one value of y such that (y, i) is a 
solution and for à > O there are three; in other words, the 
solution count does indeed change at the bifurcation point. 

A central technique of bifurcation theory is to reduce 
bifurcation points to foldpoints by adding auxiliary param- 
eters. An unfolding of F is a mapping F: E C R” x R? x 
Ré —> R” involving an additional parameter vector y € R* 
such that FO, z, 0) = F(y, à). (The mapping F needs to 
be defined only for u in some small neighborhood of the 
origin of R*.) For example, an unfolding of (67) is given by 
Fy, u) = dy — y? — p, which has a two-dimensional 
saddle surface as solution manifold with the origin as 
foldpoint. 

For the characterization of the solution behavior near a 
bifurcation point, a more selective choice of unfolding is 
needed. In particular, it is desirable that any perturbation of 
F of the form F(., -) +€Q(-,-,€), is equivalent, in a cer- 
tain sense, to ÊC, -, p(€)) for sufficiently small e. With a 
specified equivalence definition, unfoldings of this type are 
called universal unfoldings. Most, but not all, bifurcation 
problems possess such universal unfoldings. A universal 
unfolding of the pitchfork example (67) is the map- 
ping (y, X, Hy» Ha) E€ RAH Ay — y? — Hy — Bay”, where 
for given u € R? the corresponding solutions in the (y, ))- 
plane consist of paths that represent generic perturbations of 
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the original pitchfork. For details of the theory of universal 
unfoldings, we refer to Golubitsky and Schaeffer (1985). 

In mechanical problems, it is usually difficult to relate 
specific physical parameters of the problem to the aug- 
menting parameters of a universal unfolding. Accordingly, 
there is a tendency to work with general unfoldings that are 
defined by intrinsic physical parameters. For computational 
purposes, these unfoldings usually suffice. 


5.2 Scalar parameter problems 


Let (44) be a parameterized nonlinear system with a 
scalar parameter (d = 1) defined by a C” mapping F with 
sufficiently large r > 1. Suppose that there exists an open, 
connected subset My C F~1(0) of the zero set of F with 
the property that F is a submersion on My. Hence, Mo 
is a one-dimensional manifold and x* := (y*, *) € My 
is a foldpoint of Mọ (with respect to the parameter) if 
D,F (y*, X*) is singular, or equivalently, if (e")™u = 0 for 
u € T Mo 

A local parameterization of Mg consists of an open inter- 
val J of R and a homeomorphism ¢ from J onto x(J) := 
{(y(s), X(s)): s € J} C Mo. For any x := x(s), s € J, the 
tangent space T, Mo of Mg is spanned by x’(s). Thus, 
a foldpoint x* := x(s*), s* € J, is a zero of the scalar 
function 


nJ >R, n(s) = (")x'(s) (68) 


A simple zero s* of ņ (in the sense that n'(s*) 40) 
is called a simple foldpoint. In many applications, such 
simple foldpoints are of special interest. For example, in 
certain problems of structural mechanics, à represents a 
load intensity and such points may signify the onset of 
buckling. Numerous methods have been proposed for the 
computation of simple foldpoints of mappings involving a 
scalar parameter, and we summarize here only the principal 
ideas underlying some of them. A numerical comparison 
of several different methods was given by Melhem and 
Rheinboldt (1982). 

A first class of methods assumes to be near a simple 
foldpoint x* € M, and involves the construction of an 
augmented system of equations that is known to have the 
desired point as a simple zero. A possible approach is to 
use a system of the form 


F()) _ 
ice ) =0 (69) 


with a suitable function g:UC E —>R that is defined 
on some neighborhood U of x*. Possible choices of g 
include g(x) := det D,F (x) and g(x) := v(x) where v(x) 


denotes the smallest eigenvalue (in modulus) of D, F(x), 
Computationally more efficient is the definition 


eo) = AG) (2), A= (AP) 0 


where the index i, 1 <i <n is chosen such that A(x) is 
invertible, Then a modified Newton method for solving (69) 
involves the following step algorithm: 


G: input: {k, x*, M,) 
determine i, 1 <i <n such that A(x*) is invertible; 
solve A(x*)u! = (F(x*),0)7 and A(x*)u? = (0, 1); 
xttl se yk yl yp KnT at oe u?)/(e") u?] u?; 
update the memory set; 
return {k + 1, x!) M4}; (71) 


Initially, a suitable index i has to be supplied and usually 
i =n is used, At subsequent steps, i is selected such that 
\(e!)Tu?| is maximal for the last computed u?. 

Instead of an n-dimensional system (69), a larger system 
of the form 


F(x) 0 
D,F(x)w}={0], «x¢€£, w eR” (72) 
bT yw 1 


can also be used where b € R” is such that the matrix in the 
last two rows is invertible. Usually, some canonical basis 
vector of R” is chosen here. If (72) is to be solved by a form 
of Newton’s method, then second derivative terms Dy EF 
and D,, F are needed. This can be avoided by replacing 
these terms by finite difference approximations. For exam- 
ple, Moore and Spence (1980) suggested the approximation 


1 
D Fh, h?) & glDyF + 8h}, yh? 
— D,F (y, Nh?) 


with some small 840 and showed that the (n + m)- 
dimensional linear systems arising at each iteration step 
can be partitioned such that only four linear systems of 
dimension m with the same matrix have to be solved. 

A second class of methods for the computation of simple 
foldpoints assumes that a specific continuation process is 
used to approximate a connected, one-dimensional solution 
manifold My. Since simple foldpoints are simple zeros of 
the scalar mapping (68), a foldpoint is expected to have 
been passed during the continuation process when 


sign(e”)"u* # sign(e")'u**!, T; My = span(u’), 
j=k,k+1 (73) 


at two succeeding points x*, x*+! computed in the process. 
Hence, in terms of the local parameterization, we want to 
compute s* as a zero of the scalar function (68). Here, any 
one of the standard algorithms for solving scalar equations 
can be applied. A frequently used algorithm is the well- 
known Brent method, which combines root bracketing, 
bisection, and inverse quadratic interpolation. 

At a simple foldpoint, the condition (e")'x’(s*) =0 
implies that the scalar function s € J +> ¢(s) := (e")' x(s) 
has an extremum at s = s*. Hence, s* can also be deter- 
mined by an algorithm for maximizing or minimizing ¢. For 
this, various interpolatory methods tum out to be very effec- 
tive; see again Melhem and Rheinboldt (1982) for details 
and references. 

So far, F was assumed to be a submersion at all points of 
an open, connected subset of the zero set F7-1(0). We now 
allow this submersion condition to fail at certain points. 
More specifically, consider a path in F~1(0) defined by a 
C” mapping, r > 1, 


mJ CR—R", F(nx(s)) =0, n (s) £0 veer. 


on an open interval J. Let x* := 1(s*), s* € J, be a bifur- 
cation point of F where rank DF(x*) =m — 1. Moreover, 
assume that x* is an isolated singularity and, for simplicity, 
assume that rank DF(x(s)) =m for s € J, s #s*. Then 
the orientation function 


DF(x(s)) 


JOR, 86) = act ( oe 


) VseJ 
suggested by (55), has the unique zero s = s* in J. The 
bifurcation point x* is said to be simple if è changes sign 
at s*. 

When a continuation method is applied to approximate 
the path x, the process typically jumps over the bifurca- 
tion point. Accordingly, by monitoring the sign of 8, the 
bifurcation point can be detected. The computation of 8 in 
the framework of the continuation method was sketched 
in Subsection 4.4. Of course, after passing a simple bifur- 
cation point, it has to be taken into account that the two 
segments of the path before and after the point have oppo- 
site orientation. A scalar-equation solver, such as the Brent 
algorithm, can be applied for the explicit computation of 
the simple zero s* of 8. There are also other approaches 
involving augmented nonlinear systems of equations that 
will not be addressed here. 

By the definition of x, it follows from rank DF(x*) = 
m — 1 that the nullspace of DF(x*) is two-dimensional 
and contains the (nonzero) tangent vector m(s*) of x at s*. 
As the example (67) shows, once some unfolding F of F 
has been chosen, there is a vector u € ker DF (x*) that is 
linearly independent of 1(s*) and represents the direction of 
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a solution path of F branching-off from x*. But, of course, 
without the unfolding no such information is available. 


5.3 Higher-dimensional parameter spaces 


Consider now a parameterized system (44) with a parameter 
space A := {0} x R” of dimension d > 1 defined by a C" 
mapping F that is a submersion on its domain Æ. Hence, 
M := F7}(0) is a d-dimensional manifold. 

For a given nonzero vector b € {0} x A, we introduce 
the augmented system 


U € LR’, R") (74) 


Any solution (x*,U*) of (74) satisfies x* € M, U* € 
T,.M, and b L T,.M. The last of these relations is equiva- 
lent to (66) and hence x* is a foldpoint of M (with respect 
to A). In other words, systems of the form (74) can be 
used to compute foldpoints of M. Of course, the choice 
of b controls the foldpoints that can be obtained. In prac- 
tice, attention is often focused on foldpoints with respect 
to a specific component of A in which case b is chosen 
as the canonical basis vector of R” corresponding to that 
component. 

Since the dimension of the system (74) may become large 
and unwieldy, interest has focused on ‘minimal augmenta- 
tions’ of the form 


F(x) 
bux) 


GQ) = =0, x€E,CE (5) 


bu, (2) 


defined on some open set Ep; see, for example, Griewank 
and Reddien (1984). Here, {u,(x),...,44(x)} is for x € 
ENM an orthonormal basis of T,M, and, as before, 
b € {0} x A is a suitably chosen vector. For computation, 
the basis vectors have to be sufficiently smooth functions of 
x and hence have to form a moving frame; see, for example, 
Rheinboldt (1988). 

A different minimal augmentation for the computation 
of a simple foldpoint x* of M with first singularity index 
1 was developed by Dai and Rheinboldt (1990). For vec- 
tors c* ¢ rge D, F(x*) and v* ¢ rge D,F(x*)", the linear 


system 
D,F(x)™ v*\ (z 0\ _ 
(est 0) G)+G)~9 
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is nonsingular for all x € M in some neighborhood U C R” 
of x*. Then Dai and Rheinboldt (1990) showed that all 
foldpoints of M in UN M are precisely the solutions of 
the augmented system 


wu. { ©@)\ _ 
aise ( yx) ) gh 


and that these foldpoints have first singularity index 1. 
Moreover, under certain nondegeneracy conditions on x* 
and after shrinking U/ if needed, the mapping G is a submer- 
sion on U and hence the foldpoints of M in that neighbor- 
hood form a (d — 1)-dimensional manifold. This allowed 
for the development of a locally convergent iterative pro- 
cess for the computation of a simplicial approximation of 
the foldpoint manifold. This was illustrated by an example 
concerning the roll-stability of airplanes. 

The mapping F considered here may represent an unfold- 
ing of some other mapping Fy involving fewer parameters, 
and therefore, the foldpoints of F may represent bifur- 
cation points of Fy. Then it is of interest to characterize 
and compute the corresponding bifurcation directions. For 
this, we follow here an approach developed by Rabier and 
Rheinboldt (1990) involving the second fundamental tensor 
of M. 

For the definition of this tensor, we refer to the textbook 
literature. In brief, V is a symmetric, vector-valued, 2- 
covariant tensor that defines, for x € M, a mapping 


(u! u) ET, M > V,(u!, u?) € N,M 


Let Q(x) be the orthogonal projection of R” onto N, M. 
Then the diagonal terms of the tensor satisfy 


V (4, u) = -Q IDF (x) iy, m D? F (xp)(u, u), 
ueT,M (16) 


For computational purposes, this suffices since by the 
bilinearity 


Vu, u?) = ł [v u +u’, ul + u?) _ V,(u}, u}) 
—V, (u, u’)], wyweT.M 


A simple implementation of (76) by means of the QR- 
factorization of DF(x)" is included in the MANPACK 
package of Rheinboldt (1996). An algorithm for approxi- 
mating V,(u, u) without requiring the second derivative of 
F was given by Rabier and Rheinboldt (1990). 

Let x* be a foldpoint of M (with respect to A) with first 
singularity index s; > 0 and assume that {z!,..., 2} is an 
orthonormal basis of Ng. Then, subject to certain conditions 


on x*, Rabier and Rheinboldt (1990) showed that the bifur- 
cation directions at x* are the nonzero solutions u € T,.M 
of the system of s, quadratic equations 


(TVo (u, u) =0, £=1,...,5, (17) 


Because of the obvious scale invariance, the solutions are 
here assumed to have norm 1; see, Rabier and Rheinboldt 
(1990) for several examples. 

In bifurcation theory, the bifurcation directions are usu- 
ally defined by means of a mapping generated in the Lya- 
punov—Schmidt reduction of the equation F(x) = 0. While 
both characterizations give equivalent results, the approach 
via the second fundamental tensor allows for all compu- 
tations to be performed in the framework of the smooth 
manifold F-'(0) where no singularities hinder the work. 


5.4 Further topics 


As noted, the literature on bifurcation problems and their 
computational aspects is extensive and there are numerous 
major topics that cannot be considered here. We mention 
only a few important areas without entering into any detail. 

Frequently, in computational mechanics, a nonlinear sys- 
tem of equations F(y,X) = 0 arises as an equilibrium 
problem of some dynamical process, such as 


dy 2 
=" F(y,d) (78) 


In many such problems, the stability of the equilibria 
is highly important and hence it is of interest to obtain 
information about the stability behavior of (78). Now, 
bifurcation theory concerns the study of points where the 
qualitative structure of the solutions of (78) change-as the 
parameter is varied, A change in the flow structure is often 
indicated by changes in the stability of the equilibria. For 
example, for (78) with F(y, \) :=— y°, the parabolic 
path of equilibria is stable on one side of the foldpoint and 
unstable on the other. At a generic bifurcation point, there is 
typically an ‘exchange of stability’ between the equilibrium 
branches. For instance, in the case of (78) with the pitchfork 
function (67), the equilibrium branch along the negative \ 
axis gives up its stability to the two parabolic branches. 

There are various other ways in which the qualitative 
structure of the flow of (78) can change. Important exam- 
ples are the Hopf bifurcations. For instance, in the problem 
f = dr — 73,6 = —1 (in polar coordinates), all solutions for 
X < 0 spiral clockwise to the origin with increasing time. 
For i > 0, the origin has become unstable and all solutions 
(except the unstable equilibrium at the origin) are spiraling 
into a periodic orbit of radius ./r. 


For an introduction to dynamics and bifurcation, we refer 
to Hale and Kogak (1991). Clearly, (78) represents only 
the simplest dynamical system that has the zeros of F as 
its equilibria. In fact, in many mechanical problems, the 
dynamics arise in the form of a partial differential equation. 

Symmetry is a natural phenomenon in many applica- 
tions and often reflects some invariance properties of the 
problem. Frequently, a parameterized mapping F: E C Y x 
A —> R” turns out to be ‘covariant’? with respect to a 
transformation group I in the sense that 


F(Ty)y, TaN = SFO) 
wel, NEE (79) 


where Ty, Th, and y are group representations of I’. For 
example, the unfolded pitchfork problem F(y, i, u) := 
hy—y?—p is covariant under the Z,-symmetry 
F(—y, X, —u) = —F (y, >, y). There is a large literature 
on symmetry behavior, and in particular, on symmetry 
breaking at bifurcation points of F. Group theoretical meth- 
ods can be used effectively in numerical computations, for 
example, for detecting and evaluating bifurcation points. 
Symmetries also allow for the system to be restricted to cer- 
tain subspaces defined by the groups under consideration. 
In addition, symmetries help in designing easy orderings 
of the computed results. The computational aspects of this 
topic were surveyed by Dellnitz and Werner (1989). 

In practice, parameterized equations often arise in the 
form of boundary value problems for partial differential 
equations. In other words, the system F(y, }) = 0 under 
consideration involves an operator F: X := Y x R¢ > Z 
between infinite-dimensional spaces X, Z, usually assumed 
to be Banach spaces, For the computation, this system has 
to be discretized, that is, a family F,:X —> Z of finite- 
dimensional approximations of F has to be constructed. 
Then the problem is to establish approximation theorems 
that guarantee the convergence of F, to F as the discretiza- 
tion parameter h e R tends to zero and that also provide 
error estimates. In addition, it is important to compare the 
solution sets of F and F, and their structural properties, 
and to determine if singularities of F are approximated by 
singularities of F, of the corresponding type. Results along 
this line were surveyed by Caloz and Rappaz (1997). 
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The simpler a hypothesis is, the better it is. (Leibniz) 


1 WHAT IS A PARABOLIC PROBLEM? 


A common classification of partial differential equations 
uses the terms elliptic, parabolic, and hyperbolic, with 
the stationary Poisson equation being a prototype exam- 
ple of an elliptic problem, the time-dependent heat equa- 
tion that of a parabolic problem, and the time-dependent 
wave equation being a hyperbolic problem. More generally, 
parabolic problems are often described, vaguely speaking, 
as ‘diffusion-dominated’, while hyperbolic problems are 
described as ‘convection-dominated’ in a setting of sys- 
tems of convection—diffusion equations. Alternatively, the 
term ‘stiff problems’ is used to describe parabolic problems, 
with the term stiff referring to the characteristic presence 
of a range of time scales, varying from slow to fast with 
increasing damping. 

In the context of computational methods for a general 
class of systems of time-dependent convection—diffusion- 
reaction equations, the notion of ‘parabolicity’ or ‘stiffness’ 
may be given a precise quantitative definition, which will 
be focal point of this presentation. We will define a system 
of convection—diffusion-reaction equations to be parabolic 
if computational solution is possible over a long time with- 
out error accumulation, or alternatively, if a certain strong 
stability factor S(T), measuring error accumulation, is of 
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unit size independent of the length T in time of the 
simulation. More precisely, the error accumulation con- 
cems the Galerkin discretization error in a discontinuous 
Galerkin method dG(q) using piecewise polynomials of 
degree q with a resulting order of 2g + 1. (The total dis- 
cretization error may also contain a quadrature error, which 
typically accumulates at a linear rate in time for a parabolic 
problem.) This gives parabolicity a precise quantitative 
meaning with a direct connection to computational meth- 
ods. A parabolic problem thus exhibits a feature of ‘loss 
of memory’ for Galerkin errors satisfying an orthogonal- 
ity condition, which allows long-time integration without 
error accumulation. As shall be made explicit below, our 
definition of parabolicity through a certain stability factor 
is closely related to the definition of an analytic semigroup. 

For a typical hyperbolic problem, the corresponding 
strong stability factor will grow linearly in time, while 
for more general initial value problems the growth may 
be polynomial or even exponential in time. 

The solutions of parabolic systems, in general, vary 
considerably in space-time and from one component to 
the other with occasional transients, where derivatives 
are large. Efficient computational methods for parabolic 
problems thus require adaptive control of the mesh size in 
both space and time, or more general multiadaptive control 
with possibly different resolution in time for different 
components (see Chapter 4, Chapter 6, Chapter 7 of 
Volume 3). 


2 OUTLINE 


We first consider in Section 4 time-stepping methods for 
Initial Value Problems (IVPs) for systems of ordinary dif- 
ferential equations. We present an a posteriori error analysis 
exhibiting the characteristic feature of a parabolic prob- 
lem of nonaccumulation of Galerkin errors in the setting 
of the backward Euler method (the discontinuous Galerkin 
method dG(0)), with piecewise constant (polynomial of 
order 0) approximation in time. The a posteriori error esti- 
mate involves the residual of the computed solution and 
stability factors/weights obtained by solving an associated 
dual linearized problem expressing in quantitative form the 
stability features of the IVP being solved. The a posteri- 
ori error estimate forms the basis of an adaptive method 
for time step control with the objective of controlling the 
Euclidean norm of the error uniformly in time or at selected 
time levels, or some other output quantity. The form of the 
a posteriori error estimate expresses the characteristic fea- 
ture of a parabolic problem that the time step control is 
independent of the length in time of the simulation. 

In Section 5, we compute stability factors for a couple of 
IVPs modeling chemical reactions and find that the strong 
stability factor S.(T) remains of unit size over a long time. 


In Section 6, we contrast an IVP with exponentially 
growing stability factors: the Lorenz system. 

The backward Euler method, or more generally the 
dG(q) method, is implicit and requires the solution of 
a nonlinear system of equations at each time step. In 
Section 7, we study iterative fixed point-type solution 
strategies resembling explicit time-stepping methods. How- 
ever, since explicit time-stepping for stiff problems is 
unstable unless the time step is smaller than the fastest 
time scale, which may be unnecessarily restrictive out- 
side fast transients, we include a stabilization technique 
based on adaptively stabilizing the stiff system by taking 
a couple of small time steps when needed. We show effi- 
ciency gain factors compared to traditional explicit meth- 
ods with the time step restriction indicated, of the order 
10 to 100 or more depending on the problem. The need 
for explicit-type methods for parabolic problems avoid- 
ing forming Jacobians and solving associated linear sys- 
tems of equations, is very apparent for the large systems 
of convection—diffusion-reaction equations arising in the 
modeling of chemical reactors with many reactants and 
reactions involved. The need for explicit time-stepping 
also arises in the setting of multiadaptive time-stepping 
with the time step varying in both space and for different 
reactants, since here the discrete equations may be cou- 
pled over several time steps for some of the subdomains 
(or reactants), leading to very large systems of algebraic 
equations. 

In Section 8, we prove the basic strong stability estimates 
for an abstract parabolic model problem and connect to the 
definition of an analytic semigroup. 

In Sections 9 to 16, we present adaptive space-time 
Galerkin finite element methods for a model parabolic IVP, 
the heat equation, including a priori and a posteriori error 
estimates. The space-time Galerkin discretization method 
cG(p)dG(q) is based on the continuous Galerkin method 
cG(p) with piecewise polynomials of degree p in space, 
and the discontinuous Galerkin method dG(q) with piece- 
wise polynomials of degree q in time (for q = 0,1). In 
Section 17, we discuss briefly the extension to convec- 
tion—diffusion-reaction systems, and present computational 
results in Section 18. 


3 REFERENCES TO THE LITERATURE 


We have decided to give a coherent, concise presentation 
using duality-based, space-time Galerkin methods (with 
some key references), rather than to try to give a survey 
of the work in the entire field of parabolic or stiff prob- 
lems (with a massive number of references). However, we 
believe that our presentation may be viewed as a summary 


Adaptive Computational Methods for Parabolic Problems 677 


and condensation of much work prior to ours to which spe- 
cific references can be found in the references given below. 
This work includes the pioneering work in the early 1970s 
initiated in Douglas and Dupont (1970) on Galerkin meth- 
ods for parabolic problems and by Thomée during the 80s 
and 90s summarized in Thomée (2002) including many ref- 
erences to related work by Luskin, Rannacher, Wheeler, and 
many others. 

It is of course also highly relevant to point to the world of 
stiff ordinary differential equations (stiff ode’s) discovered 
by Dahlquist in the 50s, and explored by Deufelhard, Gear, 
Hairer, Lubich, Petzold, Wanner, and many others. As 
general references into this world, we give the books by 
Hairer and Wanner (1996) and Deufelhard and Bornemann 
(2002). In the concluding Section 19, we give our view 
of the relation between our methods for adaptive control 
of the global error and the methods for local error control 
commonly presented in the ode-world. Concerning Galerkin 
methods for time discretization we also refer to the early 
work on ode’s by Delfour, Hager and Trochu (1981). 


4 INTRODUCTION TO ADAPTIVE 
METHODS FOR IVPs 


We now give a brief introduction to the general topic of 
adaptive error control for numerical time-stepping meth- 
ods for initial value problems, with special reference to 
parabolic or stiff problems. In an adaptive method, the time 
steps are chosen automatically with the purpose of control- 
ling the numerical error to stay within a given tolerance 
level. The adaptive method is based on an a posteriori error 
estimate involving the residual of the computed solution 
and results of auxiliary computations of stability factors, or 
more generally stability weights. 
We consider an IVP of the form 


a(t) = fult) forO<t<T, uQ)= w (1) 


where f : R? > R is a given differentiable function, u’ € 
R? a given initial value, and T > 0 a given final time. 
For the computational solution of (1), we let 0 = f% <4, < 
see Sty <f, <+ < ty =T be an increasing sequence 
of discrete time steps with corresponding time intervals 
I, = (t,-1,,] and time steps k, = fn — tr» and consider 
the backward Euler method: Find U(t,) successively for 
n=0,1,..., N, according to the formula 


Ut,) = UG) + FUG) (2) 


with U(0) = u°, The backward Euler method is implicit 
in the sense that to compute the value U(f,) with U(t,_,) 


already computed, we need to solve a system of equations. 
We will return to this aspect below. 

We associate a function U(t) defined on {0, Tj to the 
nodal values U(t,), n = 0,1,..., N, as follows: 


U =U) fort € G14] 


In other words, U (t) is left-continuous piecewise constant 
on (0, T] and takes the value U(,,) on I,, and thus takes 
a jump from the limit from the left U (t1) = UG@,_1) to 
the limit from the right U(tt_,) = U (t,) at the time level 
t =1,_;. We can now write the backward Euler method in 
the form 


tn 
UG@,) = U (n1) + f(U@)) de 


fy-l 


or equivalently 
UG) v= ZORRES N fU -vdt 8) 
tn- 


for all v € R? with the dot signifying the scalar product 
in R°. This method is also referred to as dG(0), the dis- 
continuous Galerkin method of order zero, corresponding to 
approximating the exact solution u(t) by a piecewise con- 
stant function U(t) satisfying the Galerkin orthogonality 
condition (3). 

The general dG(q) method takes the form (3), with the 
restriction to each time interval J, of the solution U(t) 
and the test function v on each time interval Z, being a 
polynomial of degree q. The dG(g) method also comes 
in a multiadaptive form with each component and the 
corresponding test function being piecewise polynomial 
with possibly different sequences of time steps for different 
components. A 

We shall now derive an a posteriori error estimate aiming 
at control of the scalar product of the error e(T) = (u ~ 
UXT) at final time T with a given vector yy, where 
we assume that yp is normalized so that ipi = 1. Here 
l- || denotes the Euclidean norm on R. We introduce 
the following linearized dual problem running backward 
in time: 


—b) =A (Oe) forO<t<T, HT =y 4 
with 
$ 
A(t) = f fi(su(t) + A-900) ds 
0 


where u(t) is the exact solution and U(r) the approximate 
solution, f’ is the Jacobian of f, and T denotes transpose. 


678 Adaptive Computational Methods for Parabolic Problems 


We note that f(u(t)) — f(U@) = Alt) ult) — U (t)). We 
now start from the identity 


INS oe : 
eT) b=e(T)- b+ >) | e$- ATH) at 


n=} fo! 


and integrate by parts on each subinterval (t,_,,1,) to get 
the error representation: 


N by 
e(T)-b= >. | (é— Ae)- oat 


n=] “fr 
N 
-F UG) — UG») Oina) 
n=l 


where the last term results from the jumps of U(t) at the 
nodes t = f,_,. Since now u solves the differential equation 
ù — f(u) = 0, and U =0 on each time interval (¢,_1,,,), 
we have 
é-Ae=u- fie) -U+fU)=-U + f(U) 
= f(U) on 1,4) 
It follows that 


N 


eT) -y = > UG) — UG) «6,4 


n= 
T 
+ Í fU)- ode 
0 


Using the Galerkin orthogonality (3) with v = $ » the mean 
value of b over J,, we get 


N 
eT) y= FS UG) — UG) OGD n) 


n=1 


N ah 7 
+E IDA 
n=1 "hm 
Since now 
" fU) -9d =0 


thi 


because f(U(t)) is constant on (fp, fal, the error repre- 
sentation takes the form 


N 
eT) -y = > UG) — U Cta)  OGy-1) — n) 


n=l 


Finally, from the estimate 


TOPENIE f "ON at 


n=] 


we obtain the following a posteriori error estimate for the 
backward Euler or dG(0) method: 


le(T) pl < S(T.) max MUG) Ud ©) 


where the stability factor S,(T, y), recalling (4), is defined 
by 


T . 
S.T, 4) = [ IOl ar © 


Maximizing over with ||Wl| = 1, we obtain a posteriori 
control of the Euclidean norm of e(T): 


le(T) il s S(T) 2 IU n) -UG_pll 7) 
with corresponding stability factor 
S(T) = max S(T, 8 
(T) p (T, 4) (8) 
Equivalently, we can write this estimate as 
lel < S(T) max Tk) RUE) (9) 


where k(t) =k, =b t for t¢€(t,1,4,], and 
RU (t)) = (U p) — U(t,_1))/k, = f (U (t,)) corresponds 
to the residual obtained by inserting the discrete solution 
into the differential equation (noting that Ú (t) = 0 on each 
time interval). 

We can, thus, also express the a posteriori error estimate 
(5) in the form 


T 
le(T) -4l z] KORU OI de (10) 


where now the dual solution enters as a weight in a time 
integral involving the residual R(U(t)). Maximizing over 
k(t)R(U(t)) and integrating oct) || we obtain the original 
estimate (9). 

We now define the [VP (1) to be parabolic if (up to 
possibly logarithmic factors) the stability factor S,(T) is 
of unit size for all T. We shall see that another typical 
feature of a parabolic problem is that the stability factor 
S(T, iy) varies little with the specific choice of normalized 
initial data , which means that to compute §,(T) = 
MAX) yj SAT, 4), we may drastically restrict the variation 
of wy and solve the dual problem with only a few different 
initial data. 
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If we perturb f to f in the discretization with dG(q), 
for instance by approximating f(U(t)) by a polynomial 
connecting to quadrature in computing f n FU) dt, we 
obtain an additional contribution to the a posteriori error 
estimates of the form 


St P) miar IUE) — f UOI 


or S(T) maxig 7; (| f(U()) — F(U), with correspond- 
ing stability factors defined by 


T 
sT = f ONA 


where ¢ solves the backward dual problem with (7) = Y, 
and S,(T) = maxyyy—1 S(T, W)- In a parabolic problem, 
we may have S,(T) ~T, although S,(T) ~ 1 for ali T > 
0. We note that S(T) involves the time derivative 6, while 
5,(T) involves the dual ọ itself. 

Note that in dG(O) there is no need for quadrature 
in the present case of an autonomous IVP since then 
f (U(t) is piecewise constant. However, in a corresponding 
nonautonomous problem of the form ù = f (u(t), t) with f 
depending explicitly on t, quadrature may be needed also 
for dG(0). 

The basic parabolic or stiff problem is a linear constant 
coefficient IVP of the form u(t) = f (u(t), t) = —Au(t) + 
f(t) for 0<t <T, u(0) = u?, with A a constant positive 
semidefinite symmetric matrix with eigenvalues ranging 
from small to large positive. In this case, f’(u) = —A with 
eigenvalues \ > 0 and corresponding solution components 
varying on time scales 1/2 ranging from very long (slow 
varjation/decay if is small positive) to very short (fast 
variation/decay if à is large positive). A solution to atypical 
stiff problem thus has a range of time scales varying from 
slow to fast. In this case, the dual problem takes the form 
—d(t) = —Ao(t) for 0 <t <T, and the strong stability 
estimate states that, independent of the distribution of the 
eigenvalues > 0 of A, we have 

T A 1 
[ c-obore si 
0 4 
where we assume that ||ọ(T)|| = 1. From this we may 
derive that for 0 < € < T, 


aa 1 T\\'? 
[ TOLER (ice ©) 


which up to a logarithmic factor states that S,(T) ~ 1 for all 
T > 0. Further, the corresponding (weak) stability estimate 
states that ||(¢){{ < |||], from which it directly follows 


that S(T) <T, as indicated. The (simple) proofs of the 
stability estimates are given below. 

The stability factors $,(T, i) and S(T, y) may be 
approximately computed a posteriori by replacing A(t) in 
(4) with f’(U(t)), assuming U(t) is sufficiently close to 
u(t) for all ¢, and solving the corresponding backward 
dual problem numerically (e.g. using the dG(0) method). 
We may similarly compute approximations of S(T) and 
S(T) by varying p. By computing the stability factors, we 
get concrete evidence of the parabolicity of the underlying 
problem, which may be difficult (or impossible) to assess 
analytically a priori. Of course, there is also a gradual 
degeneracy of the parabolicity as the stability factor 5,(T) 
increases. 

The a posteriori error estimate (7) can be used as the 
basis for an adaptive time-stepping algorithm, controlling 
the size of the Galerkin discretization error, of the form: 
For n = 1,2,..., N, choose k, so that 


TOL 
EUD -UG—))l S Kea) 


for some tolerance TOL > 0. Recalling that the character- 
istic feature of a parabolic problem is that S(T) ~ 1 for 
all T > 0, this means that the time step control related 
to the Galerkin discretization error will be independent 
of the length of the time interval of the simulation. This 
means that long-time integration without error accumula- 
tion is possible, which may be interpreted as some kind of 
‘parabolic loss of memory’. We note again that this con- 
cerns the Galerkin error only, which has this special feature 
as a consequence of the Galerkin orthogonality. However, 
the quadrature error may accumulate in time typically at 
a linear rate. and so a long-time simulation may require 
more accurate quadrature than a simulation over a shorter 
interval. 


Remark 1. We now present a very simple parabolic 
model problem, where we can directly see the basic feature 
of long-time integration without nonaccumulation of dG(), 
which we just proved for a general parabolic problem. 
The model problem is simply u(t) = f(t), where f does 
not depend on u(t). For this problem, which is certainly 
parabolic with our definition, dG(0) takes the form 


Ue =U D+ fO an 


Tyo 


It follows that dG(0) (with exact quadrature) coincides with 
the exact solution at the discrete time levels t,, and thus 
there is no error accumulation at all over a long time. The 
reason is clearly the mean value property (11) of dG(0) 
expressing Galerkin orthogonality. On the other hand, if 
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we use quadrature to compute the integral in (11), then we 
may expect the quadrature error in general to accumulate 
(at a linear rate). 


5 EXAMPLES OF STIFF IVPs 


We have stated above that a parabolic or stiff initial value 
problem a(t) = f (u(t)) for 0 < t < T, u(0) = u°, may be 
characterized by the fact that the stability factor $,(T) is 
of moderate (unit) size independent of T > 0, while the 
norm of the linearized operator f’(u(t)) may be large, 
corresponding to the presence of large negative eigenvalues. 
Such initial value problems are common in models of 
chemical reactions, with reactions on a range of time scales 
varying from slow to fast. Typical solutions include so- 
called transients where the fast reactions make the solution 
change quickly over a short (initial) time interval, after 
which the fast reactions are ‘burned out’ and the slow 
reactions make the solution change on a longer time scale. 
We now consider a set of test problems that we solve by 
the adaptive dG(0) method, including computation of the 
strong stability factor S,(T). 


5.1 Model problem: ż + Au (t) = f (t) with A 
positive symmetric semidefinite 


As indicated, the basic example of a parabolic IVP takes the 
form ù + Au(t) = f(t) for0 < t < T, u(0) = u°, where A 
is a positive semidefinite square matrix, We consider here 
the case 


—4.94 260 0.11 0.10 0.06 

2.60 —4.83 269 017 0.10 

A= 0.11 2.69 —4.78 269 0.11 
0.10 0.17 2.69 —4.83 2.60 

0.06 O10 O11 260 —4.94 


with eigenvalues (0, —2.5, ~5, —7.5, —9.33). In Figure 1, 
we plot the solution, the dual solution and the stability 
factor S,(7,y) as a function of T for a collection of 
different initial values @(T) = y. We note that the variation 
with if is rather small: about a factor 4. We also note the 
initial transient, both for the solution itself and for the dual 
problem runs backwards in time. 


5.2 The Akzo—Nobel system of 
chemical reactions 


We consider next the so-called Akzo—Nobel problem, 
which is a test problem for solvers of stiff ODEs mod- 
eling chemical reactions: Find the concentrations u(t) = 


(u,(t),...,4%g(t)) such that for 0 < t < T, 


ù, = —2r +r = r3 m T4 
ú, = —0.5r, — r4 — 0.5r; +F 
ú; =r; -r ths 


s 12 
ug = -r3 +13 — 2ra a2) 
üs =r, — r3 ts 

ug = ~r; 


where F = 3.3. (0.9/737 — u,) and the reaction rates 
are given by r; = 18.7: ut./(u2), r, = 0.58 + uzu4, r3 = 
0.58/34.4 . ujus, r4 = 0.09 - u,u4 and r; = 0.42 -u2,/(uy), 
with the initial condition u’ = (0.437, 0.00123, 0, 0, 0, 
0.367). In Figure 2, we plot the solution, the dual solution 
and the stability factor $,(T) as a function of T. We note 
the initial transients in the concentrations and their long- 
time, very slow variation after the active phase of reaction. 
We also note that S,(T) initially grows to about 3.5 and 
then falls back to a value around 2. This is a typical 
behavior for reactive systems, where momentarily during 
the active phase of reaction the perturbation growth may 
be considerable, while over a long time the memory of that 
phase fades. On the other hand, S(T) grows consistently, 
which shows that fading memory requires some mean value 
to be zero (Galerkin orthogonality). We present below more 
examples of this nature exhibiting features of parabolicity. 


6 A NONSTIEF IVP: 
THE LORENZ SYSTEM 


The Lorenz system presented in 1972 by meteorologist 
Edward Lorenz: 


u, = —10u, + 10x, 
tü, = 28u, — Uy — UU; 
8 (13) 


uz = =33 + uu, 
u(0) = u? 


is an example of an IVP with exponentially growing sta- 
bility factors reflecting a strong sensitivity to perturbations. 
Lorenz chose the model to illustrate perturbation sensitiv- 
ity in meteorological models, making forecasts of daily 
weather virtually impossible over a period of more than 
a week. For the Lorenz system, accurate numerical solu- 
tion using double precision beyond 50 units of time seems 
impossible. Evidently, the Lorenz system is not parabolic. 

The system (13) has three equilibrium points z with 
f@) =0: ü= (0,0,0) and uw = (46,/(2), +6./(2), 27). 
The equilibrium point 7 = (0, 0, 0) is unstable with the cor- 
responding Jacobian f’(#) having one positive (unstable) 
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Figure 1. Symmetric IVP: solution, dual solution, and stability 
http://www.mrw.interscience.wiley.com/ecni 


eigenvalue and two negative (stable) eigenvalues. The equi- 
librium points (+6,/(2), +6./(2), 27) are slightly unsta- 
ble with the corresponding Jacobians having one negative 
(stable} eigenvalue and two eigenvalues with very small 


factors S(T, Y). A color version of this image is available at 


positive real part (slightly unstable) and also an imaginary 
part. More precisely, the eigenvalues at the two nonzero 
equilibrium points are X; * —13.9 and z3 œ 0.0939 + 
10.17. 
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Figure 2. The Akzo—Nobel problem: solution, dual solution, stability factor S.(T, ), and stability factor S(T, ). A color version 
of this image is available at http://www.mrw.interscience.wiley.com/ecm 
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Figure 3. Two views of a numerical trajectory of the Lorenz system over the time interval [0, 30]. 
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Figure 4. The growth of the stability factor S(T) for the Lorenz problem. A color version of this image is available at 


http://www.mrw.interscience.wiley.com/ecm 


In Figure 3, we present two views of a solution u(t) that 
starts at u(0) = (1, 0, 0) computed to time 30 with an error 
tolerance of TOL = 0.5 using an adaptive IVP solver of the 
form presented above. The plotted trajectory is typical: it is 
kicked away from the unstable point (0, 0,0) and moves 
towards one of the nonzero equilibrium points. It then 
slowly orbits away from that point and at some time decides 
to cross over towards the other nonzero equilibrium point, 
again slowly orbiting away from that point and coming back 
again, orbiting out, crossing over, and so on. This pattern of 
some orbits around one nonzero equilibrium point followed 
by a transition to the other nonzero equilibrium point is 
repeated with a seemingly random number of revolutions 
around each nonzero equilibrium point. 

In Figure 4, we plot the size of the stability factor 5,(T) 
connected to quadrature errors as a function of final time T. 


We notice that the stability factor takes an exponential leap 
every time the trajectory flips, while the growth is slower 
when the trajectory orbits one of the nonzero equilibrium 
points. The stability factor grows on an average as 107/3, 
which sets the effective time limit of accurate computation 
to T %50 computing in double precision with say 15 
accurate digits. 


7 EXPLICIT TIME-STEPPING FOR 
STIFF IVPs 


The dG(0) method for the IVP ú = f(u) takes the form 


U(t,) — k, f (U ,)) = UG) 


684 Adaptive Computational Methods for Parabolic Problems 


At each time step we have to solve an equation of the form 
v — k, f(v) = U(,_;) with U@,_,) given. To this end, we 
may try a damped fixed-point iteration of the form 


v™ = (Fav? + ACU (ty) +k, FW") 


with some suitable matrix œ (or constant in the simplest 
case). Choosing a = 7 with only one iteration corresponds 
to the explicit Euler method. Convergence of the fixed-point 
iteration requires that 


|Z —a+k,af’(v)|] <1 


for relevant values of v, which could force œ to be small 
(e.g. in the stiff case with f’(v) having large negative 
eigenvalues) and result in slow convergence. A simple 
choice is to take œ to be a diagonal matrix with a; = 
1/1 — k, fv), corresponding to a diagonal approx- 
imation of Newton’s method, with the hope that the number 
of iterations will be small. 

We just leamed that explicit time-stepping for stiff prob- 
lems requires small time steps outside transients and thus 
may be inefficient. We shall now indicate a way to get 
around this limitation through a process of stabilization, 
where a large time step is accompanied by a couple of small 
time steps, The resulting method has similarities with the 
control system of a modern (unstable) jet fighter like the 
Swedish JAS Gripen, the flight of which is controlled by 
quick small flaps of a pair of small extra wings ahead of 
the main wings, or balancing a stick vertically on the finger 
tips if we want a more domestic application. 

We shall now explain the basic (simple) idea of sta- 
bilization and present some examples as illustrations of 
fundamental aspects of adaptive [VP-solvers and stiff prob- 
lems. Thus to start with, suppose we apply the explicit Euler 
method to the scalar problem 


últ) + Ault) =0 forO<t<T 
u(0) =u? (14) 


with } > 0 taking first a large time step K satisfying 
K > 2 and then m small time steps k satisfying ki < 2, 
to get the method 


U) = (1 — kN” (1 — KNU (1) (15) 


altogether corresponding to a time step of size k, = K + 
mk. Here K gives a large unstable time step with |f — 
K| >1 and k is a small time step with ]1 — kì] < 1. 
Defining the polynomial function p(x) = (1 — 8x)” (1 — 
x), where 6 = (k/K), we can write the method (15) in the 
form 


U (n) = PIKNU (p1) 


For stability, we need 
IP(KN| =1, that is |i —AA"(KXA— 1) <1 


or 
log(K — 1) 
ee YS DOK 16 
eet a og(K)) a6) 
with c = kh œ 1/2 for definiteness. 

We conclude that m may be quite small even if KX is 
large since the logarithm grows so slowly, and then only a 
small fraction of the total time (a small fraction of the time 
interval [0, 7]) will be spent on stabilizing time-stepping 
with the small time steps k. 

To measure the efficiency gain, we introduce 


De i+m 2 ( 11 

~ K-+km K’k 
which is the number of time steps per unit time interval 
with the stabilized explicit Euler method. By (16) we have 


~ it2log(Ky) _ n EEN 


© 2n 17 
t~ Ey log(KW)/d mn S ca) 


for KX >> 1. On the other hand, the number of time steps 
per unit time interval for the standard explicit Euler method 
is 
®M== 18 
= 5 (18) 
with the maximum stable time step being k, = 2/2. 


The cost reduction factor using the stabilized explicit 
Euler method would thus be 


wy A4log(K>) 
Qo Ki 


which can be quite significant for large values of K. For 
typical parabolic problems, % + Au(t) = 0, the eigenvalues 
of A are distributed on the interval [0, nax], and for the 
damping to be efficient we need a slightly modified time 
step sequence. This is described in more detail in Eriksson, 
Johnson and Logg (2002). 

We now present some examples using an adaptive cG(1) 
IVP-solver, where explicit fixed-point iteration (using only 
a couple of iterations) on each time interval is combined 
with stabilizing small time steps, as described for the 
explicit Euler method. In all problems, we note the initial 
transient, where the solution components change quickly, 
and the oscillating nature of the time step sequence outside 
the transient, with large time steps followed by some small 
stabilizing time steps. 
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Figure 5. Solution and time step sequence for equation (14), a/a œ 1/310. A color version of this image is available at 


http://www.mrw.interscience.wiley.com/ecm 


Example We apply the indicated method to the scalar 
problem (14) with u? = 1 and à = 1000 and display the 
result in Figure 5, The cost reduction factor in comparison 
to a standard explicit method is large: a/a % 1/310. 


Example We now consider the 2 x 2 diagonal system 


TER G 1000) “0 =0 fr0<t<T 


u(0) =u? (19) 


with u? = (1, 1). There are now two eigenmodes with large 
eigenvalues that need to be stabilized. The cost reduction 
factor is w/dy % 1/104 (see Figure 6). 


Example We consider next the so-called HIRES problem 
(‘High Irradiance RESponse’) from plant physiology, which 
consists of the following eight equations: 


ú; = —L.71u, + 0.43u, + 8.32u, + 0.0007 

ü, = 1.71u, — 8.75uy 

tig = —10.03u5 + 0.43u4 + 0.035u5 

tg = 8.32uy + 1.71ug — 1.12u4 

—1.745u, +0.43u, + 0.43u7 

—280.0ugug + 0.69u4 + 1.71us — 0.43u6 + 0.69u7 
tty = 280.0ugug — 1.81u, 

ùg = —280.0ugug + 1.81lu, 


Bie Rs 
nw 
hou 


(20) 
together with the initial condition u° = (1.0, 0, 0, 0, 0, 0, 0, 
0.0057). We present the solution and the time step sequence 


in Figure 7. The cost is now a œ% 8 and the cost reduction 
factor is a/dy % 1/33. 


Example We consider again the Akzo—Nobel problem 
from above, integrating over the interval [0,180]. We 
plot the solution and the time step sequence in Figure 8. 
Allowing a maximum time step of kya, = 1 (chosen arbi- 
trarily), the cost is « œ% 2 and the cost reduction factor is 
a/a ~ 1/9. The actual gain in a specific situation is deter- 
mined by the quotient between the large time steps and the 
small damping time steps, as well as the number of small 
damping steps that are needed. In this case, the number of 
small damping steps is small, but the large time steps are 
not very large compared to the small damping steps. The 
gain is thus determined both by the stiff nature of the prob- 
lem and the tolerance (or the size of the maximum allowed 
time step). 


Example We consider now Van der Pol’s equation: 
+p? —Detu=0 


which we write as 


te 21 
| ù, = — (ut ~ Iu — my = 


We take p = 1000 and solve on the interval [0, 10] with 
initial condition u? = (2, 0). The cost is now a % 140 and 
the cost reduction factor is #/o, % 1/75 (see Figure 9). 
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Figure 6. Solution and time step sequence for equation (19), œ/œo œ% 1/104. A color version of this image is available at 
http://www.mrw.interscience.wiley.com/ecm 
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Figure 7, Solution and time step sequence for equation (20), a/op œ% 1/33. A color version of this image is available at 
http://www.mrw.interscience.wiley.com/ecm 


8 STRONG STABILITY ESTIMATES FOR where H is a vector space with inner product (.,-) and 


AN ABSTRACT PARABOLIC MODEL norm ||-{j, A is a positive semidefinite symmetric lin- 
PROBLEM ear operator defined on a subspace of H, that is, A is 


a linear transformation satisfying (Aw, v) = (w, Av) and 
(Av, v) > 0 for all v, w in the domain of definition of 
A, and w° is the initial data. In the model problem of 
Section 4, H = R° and A is a positive semidefinite sym- 
metric d x d matrix. In the case of the heat equation, 
considered in the next section, H = L,(Q) and —A = A 


We consider an abstract parabolic model problem of the 
form: Find w(t) € H such that 


ee +Aw(?t)=0 forO<1t<T (22) 


w(0) = w? 
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Figure 8, Solution and time step sequence for the Akzo—Nobel problem, a/a % 1/9. A color version of this image is available at 
http://www.mrw interscience.wiley.com/ecm 
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Figure 9. Solution and time step sequence for equation (21), a/o» % 1/75. A color version of this image is available at 
http://www.mrw .interscience.wiley,com/ecm 
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(the Laplacian) with homogeneous Dirichlet boundary 
conditions. 

It is important to notice that we do not require A to be 
positive definite, only positive semidefinite. In particular, 
A may have very small positive or zero eigenvalues, and 
our analysis does not depend on exponential decay of all 
eigenmodes in (22). 

We now state and prove the basic strong stability esti- 
mates for the parabolic model problem (22), noting that the 
constants on the right-hand sides of the estimates are inde- 
pendent of the positive semidefinite symmetric operator A. 
It should be noted that the dual backward problem of (22), 
—o + Ag = 0, takes the form (22) with w(t) = o(T — tf). 


Lemma 1. The solution w of (22) satisfies for T > 0, 
f 
hwT)? +2 Í (Aw(t), w(t)) dt = w°? (23) 
T 1 
[ t\|Aw(t)||? dt < glee? (24) 


1 
Aw(T — |w? 
|Aw( Ns ee lw | 25 


Proof. Taking the inner product of w(t) + Aw(¢) = 0 with 
w(t), we obtain 


1d A 
2g wou + (Aw(®), w(t) = 0 
from which (23) follows. 


Next, taking the inner product of w(t) + Aw(t) = 0 with 
tAw(f) and using the fact that 


s ld 
(ùC), tAw(t)) = zg EALO w(t))) ~ AvD, w(t)) 


since A is symmetric, we find after integration that 

1 F 

TAT) w f th Aw)? dt 

0 
1 T 
=a (Aw(t), w(t) dt 
0 
from which (24) follows using (23) and the fact that 
(Aw, w) > 0, 
Finally, taking the inner product in with :?A?w(t), we 

obtain 


1d 
zg EMVOD +P wO, Awe) = t Awl) |? 


from which (25) follows after integration and using (24). 
| 


The estimates (23) to (25) express in somewhat different 
ways ‘parabolic smoothing’; in particular, (25) expresses 
that the norm of the time derivative w(t), or equivalently 
Aw(t), decreases (increases) like 1/t as £ increases 
(decreases), which means that the solution becomes 
smoother with increasing time. We note a close relation 
between the two integrals 


T T 
=f iwora = f |Aw (¢)|l dt 


and 


T T 1/2 
n=(f ioar) =(f sawar) 


both measuring strong stability of (22), with J, through 
Cauchy’s inequality being bounded by J, up to a logarithm: 


T Ty N2 pr 12 
[ vawone <(f ar) (f slawa) 
£ € t € 
T\\ 2 
eO" 


Remark 2. We now give an argument indicating that 
for the parabolic model problem (22), the stability factor 
S(T, 4) varies little with the specific choice of data pr. 
We do this by noting that the quantity S(w°) defined by 


T 
S(w®) = ( f tlAwoar) 


where w(t) solves (22) and varies little with the choice of 
initial data w°. To see this, we let {x,} be an orthonormal 
basis for H consisting of eigenfunctions of A with cor- 
responding eigenvalues {d,}, which allows us to express 
the solution w(¢) in the form >, exp(—hjt)wh x; with 
w? = (w®, x;). We may then write 


1/2 


1/2 


T 
(Sw?) = h t $ N (wp)? exp(~21;t) dr 
j 


T 
=} (w? [ 203 exp(—2),t) dt 
3 (o 
i 
Now, the factor 
T A Thy 
[ th; exp(—2d,t) dt = 1 s exp(—2s) ds 


takes on almost the same value fy" sexp(—2s)ds © 1/4 
for all j as soon as Th > 1, that is, when Ni is not very 
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small (since T is typically large). If we randomly choose 
the initial data w°, the chance of hitting an eigenfunction 
corresponding to a very small eigenvalue must be very 
small, We conclude that S(w®) varies little with w°. As 
just indicated, S(w®) is related to the integral Se lwll de, 
which is the analog of the stability factor S,(T, 1p) for the 
dual problem. The bottom line is that S,(T, y) varies little 
with the choice of yp. 


Remark 3. The solution operator {E(t)},.9 of (22), given 
by E(t)w® = w(t), is said to define a uniformly bounded 
and analytic semigroup if there is a constant S such that 
the following estimates hold: 


wee) < Sol 
lawo =H" (26) 


for t > 0. We see that this definition directly couples to the 
stability estimates of Lemma (1), in which case the constant 
S is of unit size. 


9 ADAPTIVE SPACE-TIME GALERKIN 
METHODS FOR THE HEAT 
EQUATION 


We now move on to space~time Galerkin finite element 
methods for the model parabolic partial differential equation 
in the form of the heat equation: Find u : Q x I > R such 
that 

ù— Au=f inQx! 

u=0 on Px J (27) 

ul, OQ =u inQ 


where & is a bounded domain in R with boundary I, 
on which we have posed homogeneous Dirichlet boundary 
conditions, u? is a given initial temperature, f a heat source, 
and Z = (0, T] a given time interval. 

For the discretization of the heat equation in space and 
time, we use the cG(p)dG(g) method based on a tensor 
product space-time discretization with continuous piece- 
wise polynomial approximation of degree p > 1 in space 
and discontinuous piecewise polynomial approximation of 
degree q > 0 in time, giving a method which is accurate 
of order p +1 in space and of order 2g + 1 in time. The 
discontinuous Galerkin dG(g) method used for the time 
discretization reduces to the subdiagonal Padé method for 
homogeneous constant coefficient problems and in general, 
together with quadrature for the evaluation of the integral in 
time, corresponds to an implicit Runge—Kutta method. For 
the discretization in space, we use the standard conform- 
ing continuous Galerkin cG(p) method. The eG(p)dG(q) 


method has maximal flexibility and allows the space and 
time steps to vary in both space and time. We design 
and analyze reliable and efficient adaptive algorithms for 
global error control in L,,(Z,(&)) (maximum in time and 
L, in space), with possible extensions to L,(L,(2)) with 
l<ns<o. 

The cG(p)dG(g) method is based on a partition in 
time O =f) <4 <- <t =T of the interval (0, T] into 
time intervals J, = (t 3, f] of length k, = fa — fp- with 
associated finite element spaces S, C Hd (Q) consisting 
of piecewise polynomials of degree p on a triangulation 
T, = {K} of Q into elements K with local mesh size given 
by a function h,,(x). We define 


4 
V, = pirs Soe € | 


j=0 
and 


V={v:v|, €V,,2=1,...,N} 


We thus define V to be the set of functions v : Q x Z > R 
such that the restriction of u(x, t) to each time interval 7, 
is polynomial in ż with coefficients in S,,. The cG( p)dG(q) 
method for (27) now reads: Find U € V such that for 
M1 Dy sor NG 


f (7. v) + (VU, Vu)} dé + TU] ,-1, Ya) 
=f o vdt Wwe, (28) 
In 


where [w], = w(t?) — w(t), wi) = lim, gra w(t, + 
5), Uy = u?, and (-, -) denotes the L,() or [L,(@)]* inner 
product. Note that we allow the space discretizations to 
change with time from one space-time slab Q x I, to the 
next. 

For q =0, the scheme (28) reduces to the following 
variant of the backward Euler scheme: 


Un T kp AnUn = Pp Un- +f Paf dr (29) 
In 


where U, = U|;,, An : 5, > Sn is the discrete Laplacian 
on S, defined by (—A,,v, w) = (Vv, Vw) for all w € Sp, 
and P, is the L>-projection onto S,, defined by (P,v, w) = 
(v, w) for all w € S,. 

Alternatively, (29) may be written (with f = 0) in matrix 
form as 


M, En + kpAnEn = Mabn—1 


where M, and A, are mass and stiffness matrices related to 
a nodal basis for S,, E, is the corresponding vector of nodal 
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values for U,,, and Ec is the vector of nodal values for 
P,U,,..;. Evidently, we have to solve a system of equations 
with system matrix M, +k,A, to compute &,. 


Remark 4. Note that in the discretization (28), the space 
and time steps may vary in time and that the space dis- 
cretization may be variable also in space, whereas the time 
steps k, are kept constant in space. Clearly, optimal mesh 
design requires the time steps to be variable also in space. 
Now, it is easy to extend the method (28) to admit time 
steps that are variable in space simply by defining 


Va = soln =J ovv E Sp 
j 


where now the coefficients c;(t) are piecewise polyno- 
mial of degree q in ¢ without continuity requirements 
on partitions of Z, which may vary with j. The dis- 
crete functions may now be discontinuous in time also 
inside the space-time slab & x J, and the degree q may 
vary over components and subintervals. The cG(p)dG(q) 
method again takes the form (28), with the term ((U],_,, 
ia) replaced by a sum over all jumps in time of U 
in Q x [t,_1,#,). Adaptive methods in this generality, so- 
called multiadaptive methods, are proposed and analyzed 
in detail for systems of ordinary differential equations in 
Logg (2001a,b). 


10 A PRIORI AND A POSTERIORI 
ERROR ESTIMATES FOR THE HEAT 
EQUATION 


In this section, we state a priori and a posteriori error esti- 
mates for the cG(p)dG(qg) method (28) in the case p = 1 
and g = 0, 1 and give the proofs below. A couple of tech- 
nical assumptions on the space—mesh function h,,(x) and 
time steps k, are needed: We assume that each triangulation 
7, with associated mesh size h,, satisfies, with h, g equal 
to the diameter and m, g the volume of K € Zp, 


chin Sm, « VK eT, (30) 
Caha Sh) Shy VEEK VK ET, GD) 
Vh su WreR (32) 


for some positive constants c,, C3, and u. The constant y 
will be assumed to be small enough in the a priori error 
estimates (but not in the a posteriori error estimates), We 
further assume that there are positive constants c3, c4, and 


y such that for all n we have 


Ky Saka (33) 

1 
cah, (x) < hpp) S ge Vx EQ (34) 
h2 < ykp or Sp C Spa (35) 


where h, = max,-¢/,(x). Furthermore, we assume for 
simplicity that Q is convex, so that the following elliptic 
regularity estimate holds: ||D?v|| < || Av|| for all functions 
v vanishing on I’. Here (D?v)? = Ly v4 where v; is the 
second partial derivative of v with respect to x; and xX;, and 
|| - I| denotes the Z,()-norm. With these assumptions, we 
have the following a priori error estimates: 


Theorem 1. Jf y and y are sufficiently small, then there 
is a constant C depending only on the constants ¢,, i = 
1, 2,3, 4, such that for u the solution of (27) and U that of 
(28), we have for p = 1 and q = 0,1, 


lu — Ulz, < CL, ar Em) n=1,...,N (36) 
and for q = 1, 
lun) - UG) SCL, max EmU), n=1L....N 
<m<n 


where L, = (0g(t/ky) + 1)7,  Egm(¥) = minj sgy kin 
lu lr, + Wim Dulli, g = 0, 1,2 with us? = th u? =ü, 
u® = Aü and wll, = maxsar, IWON. 

These estimates state that the discontinuous Galerkin 
method (28) is of order q +1 globally in time and of 
order 2q +1 at the discrete time levels ¢, for q = 0,1, 
and is second order in space. In particular, the estimate 
(36) is optimal compared to interpolation with piecewise 
polynomials of order g = 0, 1 in time and piecewise linears 
in space, up to the logarithmic factor L,,. The third order 
accuracy in time at the discrete time levels for q = 1 reflects 
a superconvergence feature of the dG(q) method. 

The a posteriori error estimates for (28) take the form: 


Theorem 2. if u is the solution of (27) and U that of (28) 
with p = 1, then we have for q = 0, 


lulen) = UG) S max EmU), n=1,...,N (38) 
and forq =1, 


ile(t,) — UG) < mak EmU) n=h...N (39) 


Adaptive Computational Methods for Parabolic Problems 691 


where 
4 [Uni |" 
Eon (U) RY la RU), + Y2 a ye 
+ Ya hlkm Roe (UI, 
WU ma | 
Em (U) = Y Mig ROD gg + Ya | 
m 
+ yall RU), 
and 
RW) = |f| + DZU 
IU lm! 
Ro(U) = 1f + 
m 
lAn PLU Ina 
Ry) = | fal + eo 
m 


on Q x Lp. A star indicates that the corresponding term is 
present only if S,,_, is not a subset of Sp. Further, Y; = 
LyC;, where the C; are constants related to approximation 
by piecewise constant or linear functions. Finally, D2U on 
a space element K € Tẹ, is the modulus of the maximal jump 
in normal derivative of U across an edge of K divided by 
the diameter of K. 


Remark 5. The term |f| in R(U) may a be replaced by 
|km D? f|. Similarly, the term |f| in Ry, may be replaced 
with k| f| and |f| in Ry,(U) by k| A f|. 


The a posteriori error estimates are sharp in the sense that 
the quantities on the right-hand sides can be bounded by 
the corresponding right-hand sides in the (optimal) a priori 
error estimates. Therefore, the a posteriori error estimates 
may be used as a basis for efficient adaptive algorithms, as 
we indicate below. 


11 ADAPTIVE METHODS/ALGORITHMS 


An adaptive method for the heat equation addresses the 
following problem: For a given tolerance TOL > 0, find a 
discretization in space and time Spg = {(Z,,k,)}ao1, such 
that 
(1) lult) UH) < TOL form =1,2,... 
(2) Spy is optimal, in the sense that the number of 
degrees of freedom is minimal (40) 


We approach this problem using the a posteriori estimates 
(38) and (39) in an adaptive method of the form: Find Spg 


such that forn = 1,2,..., 


Em(U) < TOL, ifg=0 
En(U) < TOL, ifg=1 
the number of degrees of freedom of 


Spy iS minimal (41) 


To solve this problem, we use an adaptive algorithm for 
choosing S,, based on equidistribution of the form: For 
each n = 1,2,..., with J, a given initial space mesh and 
kno an initial time step, determine triangulations 7, with N, 
elements of size h, (x), time steps k,,,, and corresponding 
approximate solutions U„, defined on Lp; = (n-i t-1 + 
kpj) such that for j = 0, 1,..., ñ — 1, 

* 


Aiti (U},-1,; 


Yı max WA iat ROU a llr) + Y2 
nj 


g L(K) 
TOL 
= VKeE Taj 
2 [Nj 
TOL. 
ka j1 Yall Roe Un ptr, = E ifg=0 
TOL , 
kr tValRiOn iy e ifa=1 (42) 


that is, we determine iteratively each new time step k, = 
k,q and triangulation 7, = Z,a. The number of trials Ê is 
the smallest integer j such that (41) holds with U replaced 


by U,,;, and the parameter 6 ~ 1 is chosen so that nis small. 


12 RELIABILITY AND EFFICIENCY 


By the a posteriori estimates (38) and (39), it follows that 
the adaptive method (41) is reliable in the sense that if 
(41) holds, then the error control (40) is guaranteed. The 
efficiency of (41) follows from the fact that the right-hand 
sides of the a posteriori error estimates may be bounded by 
the corresponding right-hand sides in the (optimal) a priori 
error estimates. 


13 STRONG STABILITY ESTIMATES 
FOR THE HEAT EQUATION 


We now state the fundamental strong stability results for the 
continuous and discrete problems to be used in the proofs 
of the a priori and a posteriori error estimates. Analogous 
to Section 8, we consider the problem w — Aw = 0, where 
w(t) = o(T — t) is the backward dual solution with time 
reversed, 
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The proof of Lemma 2 is similar to that of Lemma 1, 
multiplying by w(t), —tAw(t), and 2?A?w(z). The proof 
of Lemma 3 is also analogous: For q = 0, we multiply (29) 
by W, and t, A, Wp, noting that if S,_; C S, (corresponding 
to coarsening in the time direction of the primal prob- 
jem ġ — Au = f), then P,W,_) = Wyo: (An Wa» Wp) = 
(Was A, W,—1) and (A, Wr» W,-1) = (Ay_1 Whos Wai). 
The proof for q = 1 is similar. 


Lemma 2. Let w be the solution of (27) with f = 0. Then 
forT >0, 


T 
hwT? +2 f (Vw)? dt = lw? l? (43) 
0 
T 
Í DOR HAORA S 6A 


1 
Aw) s Fp lw" (45) 


Lemma 3. There is a constant C, such that if S,_, C Sp 
for n=1,2,...,N, and W is the solution of (28) with 
f =0, then for T = ty > 0, 


T N 
Wwe? +2 Í (VWa + SO MWI? = w? 


n=l 
(46) 
N ` 
Ea f W + ta, wiar 
n=l In 
N wW 2 
+ En IIE e < Cll w® |? (47) 
n=l n 
and 
N d N 
> f wir + (14, WIL + >> Wl 
geal n=] 
t 1/2 
<C (1g FEE 1) ||? |] (48) 
1 


14 A PRIORI ERROR ESTIMATES FOR 
THE Lz- AND ELLIPTIC 
PROJECTIONS 


We shall use the following a priori error estimate for the L,- 
projection P, : L,(8%) > S, defined by (w — P,w, v) =0 
for all v € S,. This estimate follows from the fact that P, 
is very close to the nodal interpolation operator J, into S,,, 


defined by J,w = w at the nodes of 7, if w is smooth (and 
J,w = J, if w € H? (Q), where ù is a locally regularized 
approximation of w). 


Lemma 4. Jf u in (32) is sufficiently small, then there 
is a positive constant C such that for all w € HR (Qn 
H?(Q), 
IC, w — P w) — (VU, Vw — P,w))| 
< CUA; R, (U) |? wl (49) 


where R,(U) = |f] + D2U. 


We shall also need the following a priori error estimate 
for the elliptic projection T, : H} (Q) — S, defined by 


(V(w — m,w), Vv) =0 Ve S, (50) 


Lemma 5. Jf pu in (32) is sufficiently small, then there 
is a positive constant C such that for all w € H UN) NA 
HA (2), 


lw — x, wl] < Cih D’ wl (51) 


Proof. We shall first prove that with e = w —1,,w, we 
have |lel]| < Cli, Vell. For this purpose, we let ġ be the 
solution of the continuous dual problem —Ag =e in Q 
with ¢=OonT, and note that by integration by parts, 
the Galerkin orthogonality (50), a standard estimate for the 
interpolation error u — J,u, together with elliptic regularity, 
we have 


lel? = (e, —Ag) = (Ve, Vo) = (Ve, Vid — J) 
< ||h, Vell |, VO — J, 
< Clih, VellllD7 oll < Cilh, Velillel 
which proves the desired estimate. Next, to prove that 
\|h, Vell < Ch? D? wll, we note that since t, J u = J u, 
we have 
lh, Vell < [hp V (w = Jp w) + hy VTC- J,w)l 
< Cilh, Vw — J,w)|| = CID wh 
where we used stability of the elliptic projection 1, in the 
form 
2, Vx,vl < Cih, Voll vo € HE) 
which is a weighted analog of the basic property of the 


elliptic projection ||Vx,,v|| < ||Vull for all v € Hi (2). For 
the proof of the weighted analog, we need the mesh size 
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not to vary too quickly, expressed in the assumption that p 
is small. a 


15 PROOF OF THE A PRIORI 
ERROR ESTIMATES 


In this section, we give the proof of the a priori estimates, 
including (36) and (37). For simplicity, we shall assume 
that S, C S,_,, corresponding to a situation where the 
solution gets smoother with increasing time. The proof is 
naturally divided into the following steps, indicating the 
overall structure of the argument: 


an error representation formula using duality; 
strong stability of the discrete dual problem; 
choice of interpolant and proof of (36); and 
choice of interpolant and proof of (37). 


a a io T 


15.1 An error representation formula 
using duality 


Given a discrete time level ty > 0, we write the discrete set 


of equations (28) determining the discrete solution U € V 
up to time ty in compact form as 


A(U, v) = (u, of) + (fv) Yue V (52) 


where 


N 
Alw, v) = F Ah, v), + (Vw, Vv),} + (we vo) 


n=1 


N 
+ wD 


n=2 


W, w), = J, œ w) dt and J = (0, T]. The error e = u — U 
satisfies the Galerkin orthogonality 


Ale v)=0 YWev (53) 
which follows from the fact that (52) is satisfied also by 
the exact solution u of (27). Let the discrete dual solution 
® e V now be defined by 

Alv, ©) = (vy. ey) Yue V (54) 


where ey =u(ty) — U (ty) is the error at final time ty, 
corresponding to control of the L,(&2)-norm of ey. We note 


that ® is a discrete cG(p)dG(q)-solution of the continuous 
dual problem 
—b-Ag=0 inQx [0,T) 
¢=0 onl x [0,T) (55) 
with initial data 6(7) = ey. This follows from the fact that 


the bilinear form A(., -), after time integration by parts, can 
also be written as 


N N-1 
Aw, v) = Xw, ò), + (Vw, Vo)q} + Dz i) 


n=1 n=l 


+ (wy, Yy) (56) 
Tn view of (54) and (53), we have for any v € V, 


leyl? = (uy — Uy, ey) + Wy — Uns ew) 
== Uy — Vy, ey) + AW — U, ®) 
= (uy — Vy, ey) FAW — u, ®) (57) 


Taking v € V to be a suitable interpolant of u here, we 
thus obtain a representation of the error ey in terms of 
an interpolation error u — v and the discrete solution ® of 
the associated dual problem, combined through the bilinear 
form A(-,-). To obtain the a priori error estimates, we 
estimate below the interpolation error u — v in L,,(L2(82)) 
and the time derivative © in L,(L,(&)) using discrete 
strong stability. 


15.2 Strong stability of the discrete dual problem 


We apply Lemma 3 to the function w(t) = (T — t), to 
obtain the strong stability estimate 


Ny. . 
lol, + f tien + Apen ar 


n=l 
N 
+). Iio], < CLylleyl (58) 
n=] 


with Ly = (log(Ty/k,) +). 


15.3 Proof of the a priori error estimate (36) 


In the error representation, we take the interpolant to be 
v==Q,n7,u on I, where Q, is the L,(J,,)-projection 
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onto polynomials of degree q on J, and x, is the elliptic 
projection defined in Section 14. For g = 0, we thus take 


it, =m f mu ds (59) 


and for q = 1, we take 


12(¢ — t,_; — &,/2) 


ii), = kz? d -1 
al, n [ve s + k3 


Shing SK 
xf (==) Tu ds (60) 
la p 


With this choice of interpolant, (57) reduces to 


N 
leyl? = ty — dy, ey) + J u — npu, ©), 
n=l 
N=1 
+ ou, — #7, [0],) — @y -äp Py) (61) 
n=l 
where we have used (50), (56) and the fact that (1,4 — 
ii, v), = 0 for all v € V,, and thus, in particular, for v = ® 
and v = A,®. 
Using Lemma 5 and the fact that Q, is bounded in || - |l;,, 
we have 


lu — Zl], < lu — Q,ull,, +10, — 1,4) Il, 


<C (mia, fae I, + "20a, (62) 


where the bound for u—Q,u follows from the Taylor 
expansion 


u(t) = u(t) + [ ii(s) 
= ult,) + (E — ))8G,) + [ (= s)ii(s) ds 


noting that Q, is the identity on the polynomial part of 
u(t). 


From (61) we thus obtain, 
2 ; Fy 2 72 
leyli? = Ciar (min, xi lard, + 4,2 ula) 


N N-1 
x (tevi + » lld + So iell + ier] 
n=l] “om 


n=l 


and conclude in view of (58) that 


llewll £ CLy max, (min, kil I, + 1ažD?ul,,) 
(63) 
By a local analysis this estimate extends to |lel|,,,, complet- 
ing the proof of (36). 


15.4 Proof of the a priori error estimate (37) 


In the error representation formula (57), we now choose 
v = R,1,u, where R, is the (Radau) projection onto linear 
functions on I, defined by (R 1,4), = %,u,, and the 
condition that R 7,“ — z,“ has mean value zero over Ip, 
that is, we take 


2 
ka 


RpTnuli, = Tyly + (t — waf Np ltp —u)ds (64) 
In 


With this choice of interpolant, (57) reduces to 


N 
2 3 
leyl" = (uy — Tytty, ey) + y (u— Tpi, >), 
a=i 


N 
-$ (Vín,u — R T„), VP), 
n=1 


N-1 
+ SoU, — Epin, [P],) — Uy — Ty4y, Py) 


ny (65) 
where in the first sum we have used the fact that m,u — 
R,1,4 is orthogonal to È (which is constant in ¢ on L) 
and in the second sum we have used (50). For the latter 
term, we have 


(V (mu — Rp Tnt), VO), = (V (Tp4 — Rp Tnt), VO; )n 
+ (Vinu — R,x,), (t —t,)V%), 


so that by our choice of R,,7,,u, 


|(V(x,u — R, 1,4), V),| 
= |(V(m,u — R,m,u), (t — t,)V®),| 


< kpl Ap (tae Rp Tn), [ ldt (66) 
Using Taylor expansions, we easily find that 


HA, (rt, — RTI, S CKN Annu 67) 
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Finally, we note that for any w € H? (2) N Hi (92) we have 
(—Ap T, w, V) = (Va, w, Vu) = (Vw, Vu) = (-Aw, v) 
Vv E Sn 
from which we deduce by taking v = —A,,1,,w that 
4,7, wl] < Awl] Yw e HN H (68) 


It now follows from (65) through (68), together with 
Lemma 5 and strong stability for ®, that 


2 5 j ep) 2nH2 
leyl? < C max (min kilur” lly, + haD ulg) 


snsN j<q42 
N N-1 
x (tet +> f èla + Dito + ier) 
n=l In n=l 
< lleyCLy max ( min kijut, + AZ DUI), ) 
ON enn ASE 2 n In 
where we have used the notation u® = Ai. This completes 


the proof of (37) and Theorem 1. 


16 PROOF OF THE A POSTERIORI 
ERROR ESTIMATES 


The proof of the a posteriori error estimates is similar 
to that of the a priori error estimates just presented. The 
difference is that now the error representation involves 
the exact solution $ of the continuous dual problem (55), 
together with the residual of the discrete solution U. By the 
definition of the dual problem and the bilinear form A, we 
have 


leyl? = Ate, 6) = Alu, $) — AU, 4) 
Using now that 
Alu, $) = U, f) + (AO), 


together with the Galerkin orthogonality, we obtain the fol- 
lowing error representation in terms of the discrete solution 
U, the dual solution ġ, and data u? and f: 


leyl? = AU, v-o) +u’, p -oi too) 


N 
= VG, v = 9), + (VU, VO = o))a} 


n=] 


N 
+E Uhn 1 — OF D+ KO- 0), 


n=l 


=I+II+11T (69) 


with obvious notation. This holds for all v € V. 


To prove (38), we now choose vi, = b= Q, Pao in 
(69), and note that 


Ù, 6 a Dn =0 
Since also 
(VU, Vb — P O))n = (ArU, $ — PpO) 


= (-A,U, (Q, — DP, $) 
=0 


it follows that 


N 
I= (VU, VP, — Nd), 


n=l 


N 
= 2 (vu, V(P, - of oar) 


Using that 
a fods T Agdt = f bat = ln) — lina) 


together with Lemma 4 and elliptic regularity, we get 
N 
Wisc IRDAUINA f oarn 
n=l a 


tN-1 
2m2 n 
< C max IDEON, ( [Hot ae +216, 
(70) 
To estimate J, we note that by Lemma 4 we have 


(CUl (Pa — DOE.) < CHARIU], WOT il 
(71) 
noting that the left-hand side is zero if S,_, C S,. By obvi- 
ous stability and approximation properties of the L,(1,)- 
projections onto the set of constant functions on I, we 
also have 


I$- Pol, < UP ols, < lolly, (72) 
and 


I$- Pbl, =f IPies f ioia 09 


We thus conclude that 


-N 


hlUh-i 
k, 


HI| <C max 
lsgnsN 


klabi 
1 


n= 
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os 
+ max |U], (f Koll a + Holy) (74) 
lsn<N 0 


The data term IM is estimated similarly. We finally use 
strong stability of p in the form 


N-1 t ns 
T kgllwt yl < Í lwl dr 


0 


ty-1 1/2 
< (f (ty -A7 ar) 
ty 1/2 
x ([" iw- owa) 


ie a 
<5 (toe) leyi (75) 


n=1 


for w= and w=Ad, together with the estimate 
kytAoXall s exp Dilley- 

Combining the estimates completes the proof of the 
posteriori error estimate (38). The proof of the a posteriori 
error estimate (39) is similar. 


17 EXTENSION TO SYSTEMS OF 
CONVECTION-DIFFUSION- 
REACTION PROBLEMS 


In a natural way, we may directly extend the scope of 
methods and analysis to systems of convection—diffusion- 
reaction equations of the form 


ú — V - (aVu) + (b - V)u — f(u) =0 in Qx (0,T] 


ðu = 0 on F x (0, T] 
u(,) =u? in Q 
(76) 
where u = (u,,...,4z) is a vector of concentrations, 


a =a(x,t) is a diagonal matrix of diffusion coefficients, 
b= b(x,t) is a given convection velocity, f(u) mod- 
els reactions, and 6, =n-V with z the exterior nor- 
mal of I’. Depending on the size of the coefficients a 
and b and the reaction term, this problem may exhibit 
more or less parabolic behavior determined by the size 
of the strong stability factor coupled to the associated 
linearized dual problem (here linearized at the exact solu- 
tion u): 


—b — V: (aVo)—V-(b) in Qx [0,T) 
— (f'u))"> =0 

0,6 =0 on F x [0, T) 

$, T= ing 


where (V - (b)); = V - (ġb;) for i =1,...,d. 


7) 


18 EXAMPLES OF REACTION- 
DIFFUSION PROBLEMS 


We now present solutions to a selection of reaction—dif- 
fusion problems, including solutions of the dual backward 
problem and computation of stability factors. 


18.1 Moving heat source 


In Figures 10 and 11, we display mesh and solution at two 
different times for the adaptive cG(1)dG(0) method applied 
to the heat equation with a moving heat source producing 
a moving hot spot. We notice that the space mesh adapts 
to the solution. 


18.2 Adaptive time steps for the heat equation 


We consider again the heat equation ú — Au = f with 
homogeneous Dirichlet boundary conditions on the unit 
square (0,1) x (0,1) over the time interval [0, 100]. 
The source f(x, t) = 2n? sin(nx) sin(xx,)[sin(2n71) + 
cos(2n7f)] is periodic in time, with corresponding exact 
solution 


u(x, t) = sin(nx) sin(nx) sin(2n72) 
In Figure 12, we show a computed solution using the 


cG(1)dG(0) method, and we also plot the time evolu- 
tion of the L,-error in space, together with the sequence 


Figure 10. Meshes for moving source problem. 


Figure 11. Solution for moving source problem. 
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Figure 12. Heat equation: solution, error, and adaptive time steps. A color version of this image is available at http://www.mrw. 


interscience.wiley.com/ecm 


of adaptive time steps. We notice that the error does 
not grow with time, reflecting the parabolic nature of 
the problem. We also note the periodic time variation 
of the time steps, reflecting the periodicity of the solu- 
tion, with larger time steps when the solution amplitude 
is small. 


18.3 Logistics reaction—diffusion 


We now consider the heat equation with a nonlinear reaction 
term, referred to as the logistics problem: 


ù —eAu =u(l—u) inQx ,T] 
3,u = 0 on x(0,T] (18) 
u(.,0) = u? in Q 


with 2 = (0, 1) x (0, 1), T = 10, « = 0.01, and 


0, 0<x <05 
0 a > 1 
Meli gee at (79) 


Through the combined action of the diffusion and reaction 
the solution u(x,t) tends to 1 for all x with increasing 
time: see Figure 13. We focus interest at final time T to 
a circle of radius r = 0.25 centered at x = (0.5, 0.5). The 
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t=5 t=10 


Figure 13. The logistics problem: solution at three different times. A color version of this image is available at http://www.mrw. 


interscience.wiley.com/ecm 


corresponding dual problem linearized at the exact solution 
u is given by 


~b—€Adb=(1—2u)b in Qx [0,T) 
a,o =0 on T x [0, T) (80) 
$, T= in Q 


where we choose = 1/nr? within the circle and zero 
outside. In Figure 14, we plot the dual solution (-, £) and 
also the stability factor S(T, 4) as function of T. As in 
the Akzo—Nobel problem discussed above, we note that 
S(T, y) reaches a maximum for T ~ 1 and then decays 
somewhat for larger T. The decay with larger T can be 
understood from the sign (1 —2u) of the coefficient of 
the ¢-term in the dual problem, which is positive when 
u(x, t) < 0.5, and thus is positive for small ¢ and x, < 0.5 
and negative for larger t. The growth phase in iy(., £) thus 
occurs after a longer phase of decay if T is large, and thus 
S,(T, y) may effectively be smaller for larger T, although 
the interval of integration is longer for large T. 


18.4 Moving reaction front 
Next, we consider a system of reaction—diffusion equations, 
modeling an auto-catalytic reaction, where A reacts to form 


B with B as a catalyst: 


A+2B—> B+2B (81) 


With u, the concentration of A and u, that of B, the system 
takes the form 


ee — Au; = —u,u3 82) 


oom 2 
Uy — €Auy = u45 


on Q x (0, 100] with Q = (0, 1) x (0, 0.25), « = 0.0001 
and homogeneous Neumann boundary conditions. As initial 
conditions, we take 


0, 0 <x < 0.25 


1, 025<x, <1 (83) 


u,(x,0) = { 
and u,(-,0) =1—u,(,0). The solution u(x,t) corre- 
sponds to a reaction front, starting at x, = 0.25 and propa- 
gating to the right in the domain until all of A is consumed 
and the concentration of B is u,=1 in all of Q: see 
Figure 15. 

The dual problem, linearized at u = (u,, U2), is given by 


->, — Ad, = —u3h, +136, (84) 
-Q — cA, = —2uj U2, + 2u Uh, 


As in the previous example, we take the final time data yy, 
for the first component of the dual to be an approximation of 
a Dirac delta function centered in the middle of the domain, 
and y; = 0. 

We note that the stability factor peaks at the time of active 
reaction and that before and after the reaction front has 
swept the region of observation the stability factor $,(T, 4) 
is significantly smaller (see Figure 16). 
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Figure 14. The logistics problem: dual solution and stability factor S,(T,y). A color version of this image is available at 


http://www.mrw interscience.wiley.com/ecm 


19 COMPARISON WITH THE 
STANDARD APPROACH TO TIME 
STEP CONTROL FOR ODEs 


We now compare our methods for adaptive error control 
with the methods for automatic time step control developed 
within the ODE community as presented, for example, in 
Hairer and Wanner (1996) and Deufelhard and Bornemann 


(2002), which we refer to as the standard approach. 
The corner-stone of the standard approach is a concept 
of local error, which is (an estimate of) the error 
contribution from each time step, and the time step is 
then controlled so as to keep the local error within a 
certain local error tolerance tol. The relation between 
the local tolerance tol and the global error after many 
time steps is then left to the (intelligent) user to 
figure out. 
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Figure 15. Reaction front problem: solution for the two components at three different times. A color version of this image is available 
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Figure 16. Reaction front problem: dual solntion and stability factor S(T) as function of T. A color version of this image is available 


at http://www.mrw interscience.wiley.com/ecm 
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For the comparison, we focus on the backward Euler 
method and first recall our basic a posteriori error estimate 
(7) for the global error e(T), that is, 


le(T)Il £ S(T) max, IU (t,) - UG, (85) 


where S,(T) is the strong stability factor. We thus obtain 
control of the global error ||e(7)|]| < TOL if we control the 
local time step so that 


UG) — UG_pii ~ TOL 


assuming S,(T) ~ 1. 

Next, the local error ê, at time step n in the standard 
approach is defined as the difference between the first-order 
backward Euler solution U(¢,) and an approximate solution 
obtained after one time step starting from U(t,_,) and using 
a higher-order method. With a trapezoidal method as the 
higher-order method, we may obtain 


a, = FUG) = FUG Il 


The standard time step control would then consist in choos- 
ing the time step so that 


é, © tol 


where tol would thus be the local error tolerance. To esti- 
mate the global error e(T) following the standard approach, 
it would appear that one would have to sum all the local 
errors to get 


N 
ler < $ ê, (86) 


n=l 


It would then seem that we should choose tol so that 
TOL = N tol, where N is the number of time steps. But 
as we said, the relation between the local and global error 
tolerance is left to the user in the standard approach. 

To bring out the essential difference between our estimate 
of the global error (85) with a standard estimate (86), 
we compare the following a priori analog of the standard 
estimate (86), obtained using that (d/dt) f (u) = ii (and 
discarding a factor 1/2): 


BP 
le) < E, = Í KONBO dt (87) 
and the analog of our estimate (85): 


leM < E, = max. OOI (88) 


where k(t) =k, on I,, and we put S(T) = 1. Now it is 
clear that by integration E, < E, (assuming for simplicity 
that ġ(0) =0) and neglecting kå, while we may have 
E, & E; in particular if T is large. This shows that our 
estimate may be very much more economical to use than 
the standard estimate. We note that the standard estimate 
will look the same independent of the function f(u), 
and thus does not take any particular advantage of the 
special properties of stiff or parabolic problems, as our 
estimate does. This explains the possibly very suboptimal 
nature of the standard estimate. Notice again that our 
estimate does not depend on heavy exponential decay of all 
components: in the model problem # + Au = 0, we assume 
that A is positive semidefinite and not strictly positive 
definite. 

In the standard approach, it is further common to choose 
the values of the higher-order method rather than the 
lower-order method because of obvious reasons. With this 
perspective, the estimate of the local error would then 
rather come out as the difference after one time step of 
a higher-order method, which is used in the computa- 
tion and a lower-order method, which is used only for 
the local error estimation. Doing so we would seem to 
lack an error estimate for the higher-order method since 
after all we estimated the local error of the lower-order 
method. 

Nevertheless, using the values of the higher-order 
method, it seems (surprisingly so) that in many problems 
including stiff problems, the global error tolerance TOL 
would be of the same size as the local error tolerance 
tol, as if there were no error accumulation. By using 
the results of the higher-order method, it thus seems 
to be possible to get global error control on the 
level of the local error tolerance. But no explanation 
of this ‘miracle’ has been given for the standard 
approach. 

Well, let’s see if we can explain this mystery using 
our sharp error analysis. Let us then take as the higher- 
order method the first-order backward Euler method and 
as the lower-order method the zero order (trivial) method: 
Û (t) = U (t,—1). The difference between the higher-order 
and the lower-order method at time step n would then 
simply be UŒ) — U@,_pIl and the time step would 
be chosen so that the local error ||U(t,) — U(t, ll £ tol. 
But this is the same as our time-step contro] guaran- 
teeing the global error control |le(T)|| < tol (assuming 
S(T) = 1). We have thus proved that for parabolic or 
stiff problems by using the values of the higher-order 
method, the global error would come out on the level of 
the local error tolerance, independent of the length of the 
simulation, and the ‘miracle’ would thus be possible to 
understand. 
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20 SOFTWARE 


There is a variety of software available for stiff odes such 
as LSODE and RADAUS using the standard approach to 
error control. The methods presented in this article have 
been implemented in a general multiadaptive framework as 
a part of the DOLFIN project developed by J. Hoffman 
and A. Logg and presented at http://www.phi.chalmers.se/ 
dolfin/. 
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1 INTRODUCTION 


Like stationary or time-harmonic problems, transient prob- 
lems can be solved by the boundary integral equation 
method. When the material coefficients are constant, a fun- 
damental solution is known and the data are given on the 
boundary, the reduction to the boundary provides efficient 
numerical methods, in particular for problems posed on 
unbounded domains. 

Such methods are widely and successfully being used 
for numerically modeling problems in heat conduction 
and diffusion, in the propagation and scattering of acous- 
tic, electromagnetic, and elastic waves, and in fluid 
dynamics. 

One can distinguish three approaches to the application 
of boundary integral methods on parabolic and hyper- 
bolic initial-boundary value problems: space-time integral 
equations, Laplace-transform methods, and time-stepping 
methods. 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1; Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


1. Space-time integral equations use the fundamental 
solution of the parabolic or hyperbolic partial differential 
equations. 

The construction of the boundary integral equations via 
representation formulas and jump relations, the appearance 
of single- and double-layer potentials, and the classification 
into first- and second-kind integral equations follow in 
a large part the formalism known for elliptic problems. 
Causality implies that the integral equations are of Volterra 
type in the time variable, and time-invariance implies that 
they are of convolution type in time. 

Numerical methods constructed from these space-time 
boundary integral equations are global in time, that is, they 
compute the solution in one step for the entire time interval. 
The boundary is the lateral boundary of the space-time 
cylinder and therefore has one dimension more than the 
boundary of the spatial domain. This increase in dimension 
at first means a substantial increase in complexity: 


- To compute the solution for a certain time, one needs 
the solution for all the preceding times since the initial 
time. 

~ The system matrix is much larger. 

- The integrals are higher-dimensional. For a prob- 
lem with three space dimensions, the matrix elements 
in a Galerkin method can require six-dimensional 
integrals. 


While the increase in memory requirements for the stor- 
age of the solution for preceding times cannot completely 
be avoided, there are situations in which the other two rea- 
sons for increased complexity are, in part, neutralized by 
special features of the problem: 
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~ The system matrix has a special structure related to 
the Volterra structure (finite convolution in time) of the 
integral equations. When low-order basis functions in 
time are used, the matrix is of block-triangular Toeplitz 
form, and for its inversion, only one block — which 
has the size of the system matrix for a corresponding 
time-independent problem — needs to be inverted. 

— When a strong Huyghens principle is valid for the par- 
tial differential equation, the integration in the integral 
representation is not extended over the whole lateral 
boundary of the space-time cylinder, but only over its 
intersection with the surface of the backward light cone. 
This means, firstly, that the integrals are of the same 
dimensionality as for time-independent problems, and 
secondly, that the dependence is not extended arbitrar- 
ily far into the past, but only up to a time corresponding 
to the time of traversal of the boundary with the fixed 
finite propagation speed. These ‘retarded potential inte- 
gral equations’ are of importance for the scalar wave 
equation in three-space dimensions, and, to a certain 
extent, for equations derived from them in electromag- 
netics and elastodynamics. On the other hand, such a 
Huyghens principle is not valid for the wave equation 
in a two-space dimension, or for the heat equation or 
for problems in elastodynamics or in fluid dynamics. 


2. Laplace transform methods solve frequency-domain 
problems, possibly for complex frequencies, For each fixed 
frequency, a standard boundary integral method for an 
elliptic problem is applied, and then the transformation 
back to the time domain employs special methods for the 
inversion of Fourier or Laplace transforms. The choice of a 
numerical method for the inverse Laplace transform can be 
guided by the choice of an approximation of the exponential 
function corresponding to a linear multistep method for 
ordinary differential equations. This idea is related to the 
operational quadrature method (Lubich, 1994). 

Laplace or Fourier transform is also used the other way 
round to pass from the time domain to the frequency 
domain. This can be done using fast Fourier transform 
(FFT) in order to simultaneously solve problems for many 
frequencies from one time-domain computation, or one can 
solve a time-domain problem with a time-harmonic right- 
hand side to get the solution for one fixed frequency. It has 
been observed that this can be efficient too (Sayah, 1998) 
because of less strict requirements for the spatial resolution. 
3. Time-stepping methods start from a time discretiza- 
tion of the original initiai-boundary value problem via an 
implicit scheme and then use boundary integral equations 
to solve the resulting elliptic problems for each time step. 
Here, the difficulty lies in the form of the problem for 
one time step, which has nonzero initial data and thus is 
not in the ideal form for an application of the boundary 


integral method, namely vanishing initial conditions and 
volume forces and nonhomogeneous boundary data. The 
solution after a time step, which defines the initial condi- 
tion for the next time step, has no reason to vanish inside the 
domain. Various methods have been devised to overcome 
this problem: 

Using volume potentials to incorporate the nonzero initial 
data often is not desirable since it requires discretization of 
the domain and thus defies the advantage of the reduction to 
the boundary. Instead of a volume potential (Newton poten- 
tial), another particular solution (or approximate particular 
solution) of the stationary problem can be used. This partic- 
ular solution may be obtained by fast solution methods, for 
example FFT or a fast Poisson solver on a fictitious domain, 
or by meshless discretization of the domain using special 
basis functions like thin-plate splines or other radial basis 
functions (so-called dual reciprocity method); see Aliabadi 
and Wrobel (2002). 

Another idea is to consider not a single time step, but all 
time steps up to the final time together as a discrete convo- 
lution equation for the sequence of solutions at the discrete 
time values. Such a discrete convolution operator whose 
(time-independent) coefficients are elliptic partial differen- 
tial operators has a fundamental solution, which can then 
be used to construct a pure boundary integral method for 
the solution of the time-discretized problem. A fundamental 
solution, which is also a discrete convolution operator, can 
be given explicitly for simple time-discretization schemes 
like the backward Euler method (‘Rothe method’, Chapko 
and Kress, 1997). For a whole class of higher-order one-step 
or multistep methods, it can be constructed using Laplace 
transforms via the operational quadrature method (Lubich 
and Schneider, 1992; Lubich, 1994). 


These three approaches for the construction of boundary 
integral methods cannot be separated completely. There are 
many points of overlap: 

The space-time integral equation method leads, after dis- 
cretization, to a system that has the same finite time convo- 
lution structure that one gets from time-stepping schemes. 
The main difference is that the former needs the knowledge 
of a space-time fundamental solution. But this is simply the 
inverse Laplace transform of the fundamental solution of 
the corresponding time-harmonic problem. 

The Laplace transform appears in several roles. It can be 
used to translate between the time domain and the frequency 
domain at the level of the formulation of the problem, and 
also at the level of the solution. 

The stability analysis for all known algorithms, for 
the space-time integral equation methods as for the time- 
stepping methods, passes by the transformation to the fre- 
quency domain and corresponding estimates for the stability 
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of boundary integral equations methods for elliptic prob- 
lems. The difficult part in this analysis is to find estimates 
uniform with respect to the frequency. 

For parabolic problems, some analysis of integral equa- 
tion methods and their numerical realization has been 
known for a long time, and the classical results for second- 
kind integral equations on smooth boundaries are summa- 
rized in the book by Pogorzelski (1966). All the standard 
numerical methods available for classical Fredholm integral 
equations of the second kind, like collocation methods or 
Nyström methods, can be used in this case. 

More recently, variational methods have been studied 
in a setting of anisotropic Sobolev spaces that allow the 
coverage of first-kind integral equations and nonsmooth 
boundaries. It has been found that, unlike the parabolic par- 
tial differential operator with its time-independent energy 
and no regularizing property in time direction, the first- 
kind boundary integral operators have a kind of anisotropic 
space-time ellipticity (Costabel, 1990, Arnold and Noon, 
1989; Brown, 1989; Brown and Shen, 1993). 

This ellipticity leads to unconditionally stable and con- 
vergent Galerkin methods (Costabel, 1990; Arnold and 
Noon, 1989; Hsiao and Saranen, 1993; Hebeker and Hsiao, 
1993). Because of their simplicity, collocation methods 
are frequently used in practice for the discretization of 
space-time boundary integral equations. An analysis of col- 
location methods for second-kind boundary integral equa- 
tions for the heat equation was given by Costabel et al. 
(1987). Fourier analysis techniques for the analysis of sta- 
bility and convergence of collocation methods for parabolic 
boundary integral equations, including first-kind integral 
equations, have been studied more recently by Hamina and 
Saranen (1994) and by Costabel and Saranen (2000, 2001, 
2003). 

The operational quadrature method for parabolic prob- 
lems was introduced and analyzed by Lubich and Schneider 
(1992). 

For hyperbolic problems, the mathematical analysis is 
mainly based on variational methods as well (Bamberger 
and Ha Duong, 1986; Ha-Duong, 1990, 1996). There is 
now a lack of ellipticity, which on the one hand leads 
to a loss of an order of regularity in the error esti- 
mates, On the other hand, most coercivity estimates are 
based on a passage to complex frequencies, which may 
lead to stability constants that grow exponentially in time. 
Instabilities (that are probably unrelated to this expo- 
nential growth) have been observed, but their analysis 
does not seem to be complete (Becache, 1991; Peirce 
and Siebrits, 1996; Peirce and Siebrits, 1997; Birgisson 
et al., 1999), Analysis of variational methods exists for the 
main domains of application of space-time boundary inte- 
gral equations, first of all for the scalar wave equation, 


where the boundary integrals are given by retarded poten- 
tials, and also for elastodynamics (Becache, 1993; Becache 
and Ha-Duong, 1994; Chudinovich, 1993c; Chudinovich, 
1993b; Chudinovich, 1993a), piezoelectricity (Khutoryan- 
sky and Sosa, 1995), and for electrodynamics (Bachelot and 
Lange, 1995; Bachelot et al., 2001; Rynne, 1999; Chudi- 
novich, 1997). An extensive review of results on variational 
methods for the retarded potential integral equations is 
given by Ha-Duong (2003). 

As in the parabolic case, collocation methods are prac- 
tically important for the hyperbolic space-time integral 
equations. For the retarded potential integral equation, the 
stability and convergence of collocation methods has now 
been established (Davies, 1994, 1998; Davies and Duncan, 
1997, 2003). 

Finally, let us mention that there have also been impor- 
tant developments in the field of fast methods for space- 
time boundary integral equations (Michielssen, 1998; Jiao 
et al., 2002; Michielssen et al., 2000; Greengard and Strain, 
1990; Greengard and Lin, 2000). 


2 SPACE-TIME INTEGRAL EQUATIONS 
2.1 Notations 


We will now study some of the above-mentioned ideas 
in closer detail. Let 2 C R”, (n >= 2), be a domain with 
compact boundary I’. The outer normal vector is denoted 
by n and the outer normal derivative by @,,. 

Let T > 0 be fixed. We denote by Q the space-time 
cylinder over Q and E its lateral boundary: 


Q=(0,T)xQ 
x=(0,T) xP 
90 = ({0} x QU EU ({T} x Q) 


For the description of the general principles, we consider 
only the simplest model problem of each type. We also 
assume that the right-hand sides have the right structure for 
the application of a ‘pure’ boundary integral method. The 
volume sources and the initial conditions vanish, so that the 
whole system is driven by boundary sources. 

Elliptic problem (Helmholtz equation with frequency 
w eC: 


(A+o)u=0 inQ 
u=g (Dirichlet) orðu=h (Neumann) onr (8 


radiation condition at oo 
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Parabolic problem (heat equation): 


(8-Aju=0 ing 
u =g (Dirichlet) or 4,4 =h (Neumann) on (P) 
u=0 fort <0 

Hyperbolic problem (wave equation with velocity 
c>0): 
(a? -— Aju=0 ing 
u =g (Dirichlet) or 3 u =h (Neumann) on E (H) 


u=0 fort<0 


2.2 Space-time representation formulas 


2.2.1 Representation formulas and jump relations 


The derivation of boundary integral equations follows from 
a general method that is valid (under suitable smooth- 
ness hypotheses on the data) in the same way for all 
three types of problems. In fact, what counts for (P) 
and (H) is the property that the lateral boundary © is 
noncharacteristic. 

The first ingredient for a boundary element method 
(BEM) is a fundamental solution G. As an example, in 
three dimensions, we have, respectively: 


efolx| (©) 
Gy) = anixl 
(4nt)3/2e-#7/4)(¢ > 0) 
Gt, x) = 15 (¢ <0) A 
1 |x| 
eS aa H 
CER Tai ( s ) bi 


Representation formulas for a solution u of the homoge- 
neous partial differential equation and x ¢ I are obtained 
from Green’s formula, applied with respect to the space 
variables in the interior and exterior domain. We assume 
that u is smooth in the interior and the exterior up to the 
boundary, but has a jump across the boundary. The jump 
of a function v across I is denoted by [v]: 


u(x) = f rE -uO 


— G(x — y)[8,u(y)T} doQ) (é) 
oe [ [noe —s,x—y)luls, YI 
— Gt — s, x — y)[3 u (y)]} do(y) ds (P) 


ut.) = f [nye 3.2 = Due 9 
~ Gt — s, x — y)[8,u(y)]} do) ds 


1 Ix — yl )] 
= a ult 1y 
JA "O 4r — yl | ( c 
ð, - = 
_ olx = YI [a (:- [x L y)] 
Anc|x — yl c 


TARU [au (« ae ' »)| aoc} 
A4n|x — y| c 


H) 


Thus, the representation in the parabolic case uses inte- 
gration over the past portion of E in the form of a finite 
convolution over the interval [0,t], whereas in the hyper- 
bolic case, only the intersection of the interior of the 
backward light cone with I is involved. In 3D, where 
Huyghens’ principle is valid for the wave equation, the 
integration extends only over the boundary of the backward 
light cone, and the last formula shows that the integration 
can be restricted to I’, giving a very simple representation 
by ‘retarded potentials’. 

We note that in the representation by retarded potentials, 
all those space-time points (s, y) contribute to u(t, x) 
from where the point (t,x) is reached with speed c by 
traveling through the space R°. In the case of waves 
propagating in the exterior of an obstacle, this leads to the 
seemingly paradoxical situation that a perturbation at (s, y) 
can contribute to u(t,x), although no signal from y has 
yet arrived in x, because in physical space, it has to travel 
around the obstacle. 

All three representation formulas can be written in a 
unified way by introducing the single-layer potential S and 
the double-layer potential D: 


u = D({u)) — Slp) qd) 


In all the cases, there hold the classical jump relations in 
the form 
[Dv] =v; [8,Dv] =0 
[Sg] =0; [8,59] = —@ 


It therefore appears natural to introduce the boundary 
operators from the sums and differences of the one-sided 
traces on the exterior (TĦ) and interior (T7) of T: 


V v= Sip (single-layer potential) 
= E(D +D\p-) (double-layer potential) 

K’ := $(0,S|p+ +4,5|p-) (normal derivative of 
single-layer potential) 

(normal derivative of 
double-layer potential) 


W = -8,D|p 
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2.2.2 Boundary integral equations 


In a standard way, the jump relations, together with these 
definitions, lead to boundary integral equations for the 
Dirichlet and Neumann problems. Typically, one has a 
choice of at least four equations for each problem: The 
first two equations come from taking the traces in the 
representation formula (1) (direct method), the third one 
comes from a single-layer representation 


u=S with unknown 4y 
and the fourth one from a double-layer representation 
u=Dw with unknown w 


For the exterior Dirichlet problem (u|; = g given, 6, Up 
= ọ unknown): 


Vo=(—3+K)g (D1) 
(3+K')o=—We (D2) 
Vyas (D3) 
(G+K)w=g (D4) 


For the exterior Neumann problem (uj, = g = v unknown, 
0,4|p = h given): 


(;-K)v=—Vh (NL) 
Wv =-(3 +K')h (N2) 

(4 -K'}b=-h (N3) 
Ww = —h (N4) 


Remember that this formal derivation is rigorously valid 
for all three types of problems. One notes that second- and 
first-kind integral equations alternate nicely. For open sur- 
faces, however, only the first-kind integral equations exist. 
The reason is that a boundary value problem on an open sur- 
face fixes not only a one-sided trace but also the jump of the 
solution; and therefore the representation formula coincides 
with a single-layer potential representation for the Dirichlet 
problem and with a double-layer potential representation 
for the Neumann problem. 

The same abstract form of space-time boundary integral 
equations (D1)—(D4) and (N1)—(N4) is obtained for more 
general classes of second-order initial-boundary value prob- 
lems. If a space-time fundamental solution is known, then 
Green’s formulas for the spatial part of the partial differen- 
tial operator are used to get the representation formulas and 
jump relations. The role of the normal derivative is played 
by the conormal derivative. 


Since for time-independent boundaries the jumps across 
the lateral boundary £ involve only jumps across the spatial 
boundary F at a fixed time ż, the jump relations and 
representation formulas for a much wider class of higher- 
order elliptic systems (Costabel and Dauge, 1997) could be 
used to obtain space-time boundary integral equations for 
parabolic and hyperbolic initial-boundary value problems 
associated with such partial differential operators. In the 
general case, this has yet to be studied. 


2.2.3 Examples of fundamental solutions 


The essential requirement for the construction of a boundary 
integral equation method is the availability of a fundamental 
solution. This can be a serious restriction on the use of 
the space-time integral equation method because explicitly 
given and sufficiently simple fundamental solutions are 
known for far less parabolic and hyperbolic equations than 
for their elliptic counterparts. 

In principle, one can pass from the frequency domain 
to the time domain by a simple Laplace transform, and 
therefore the fundamental solution for the time-dependent 
problem always has a representation by a Laplace inte- 
gral of the frequency-dependent fundamental solution of 
the corresponding elliptic problem. In practice, this repre- 
sentation can be rather complicated. An example where this 
higher level of complexity of the time-domain representa- 
tion is visible, but possibly still acceptable, is the dissipative 
wave equation with a coefficient a > 0 (and speed c = 1 
for simplicity) 


(87 +08, — Aju =0 


In the frequency domain, we obtain the same equation as 
for the wave equation with œ simply replaced by w, = 
(0? + ia). The time-harmonic fundamental solution in 
three dimensions is therefore simply 


Gu, (x) = elt Wert 


4x|x} 


From this, we obtain by inverse Laplace transformation 


e7%/2 alx| 
S VE PD ae 


xn (SVP =TeP) ec- I) 


with the Dirac distribution 8, the Heaviside function 9, and 
the modified Bessel function J,. We see that there is no 
strong Huyghens principle, and the integrals in the cor- 
responding space-time integral equations will be extended 
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over the whole intersection of the boundary © with the 
solid backward light cone {(s, y) | t — s > |x — yl}. 

For the case of elastodynamics, the corresponding space- 
time integral equations have not only been successfully 
used for a long time in practice (Mansur, 1983; Antes, 
1985, 1988) but they have also been studied mathemat- 
ically. Isotropic homogeneous materials are governed by 
the second-order hyperbolic system for the n-component 
vector field u of the displacement 


papu—divo=0 with o; = 1(0,u, + dju,) + rò; divu 


Here, p is the density and ^ and w are the Lamé constants. 
The role of the normal derivative 3, is played by the trac- 
tion operator T, where T,u=o-n is the normal stress. 
The role of the Dirichlet and Neumann boundary condi- 
tions are played by the displacement and traction boundary 
conditions respectively: 


u= g (displacement) or T u =h (traction) on © 


In three dimensions, the space-time fundamental solution 
shows the longitudinal (pressure) and transversal (shear) 
waves that propagate with the two velocities 


IA + 2u fi 
c, = ,/— and c=,- 
p p s p 


But there is no strict Huyghens principle; the support of 
the fundamental solution is not contained in the union of 
the two conical surfaces determined by these two speeds 
but rather in the closure of the domain between these two 
surfaces. The fundamental solution is a (3 x 3) matrix G, 
whose entries are given by 


= {fs (2) +(e e) 
GEE) 


Here, Big is the Kronecker symbol, 8 is the Dirac distribu- 
tion, and 9 is the Heaviside function. 

Detailed descriptions of the space-time boundary integral 
equations in elastodynamics corresponding to (D1)—(D4) 
and (N1)—(N4) above can be found in many places (Chudi- 
novich, 1993a, b; Becache and Ha-Duong, 1994; Brebbia 
et al., 1984; Antes, 1988; Aliabadi and Wrobel, 2002). 


Whereas the frequency-domain fundamental solution is 
explicitly available for generalizations of elastodynam- 
ics such as certain models of anisotropic elasticity or 
thermoelasticity (Kupradze etal., 1979) or viscoelastic- 
ity (Schanz, 2001b), the time-domain fundamental solution 
quickly becomes very complicated (for an example in two- 
dimensional piezoelectricity, see Wang et al., 2003), or is 
completely unavailable. 

For the case of electrodynamics, space-time integral 
equations have been used and analyzed extensively, too, in 
the past twelve ycars (Pujols, 1991; Daschle, 1992; Ter- 
rasse, 1993; Bachelot and Lange, 1995; Chudinovich, 
1997). An analysis of numerical methods on the basis of 
variational formulations is available, and the coupling of 
space-time integral equation methods with domain finite 
element methods has also been studied (Sayalı, 1998; Bach- 
elot et al., 2001). 

Maxwell's equations being a first-order system, the above 
formalism with its distinction between Dirichlet and Neu- 
mann conditions and between single- and double-layer 
potentials makes less sense here. There are, however, addi- 
tional symmetries that allow to give a very ‘natural’ form to 
the space-time boundary integral equations and their vari- 
ational formulations. The close relationship between the 
Maxwell equations and the scalar wave equation in three 
dimensions implies the appearance of retarded potentials 
here, too. 

The system of Maxwell’s equations in a homogeneous 
and isotropic material with electric permittivity e and mag- 
netic permeability u is 


uô, H + curl E = 0 
£0,E — curl H = 0 


The speed of light is c = 1/,/(eu), and the corresponding 
retarded potential can be abbreviated as 


u(r- Ea, y) 


1 
Sota) = [ ww) 


Then, an analogue of representation formula (1) can be 
written in the following form: 


E(t, x) = —pS(3, GDE, x) 
+ . grad, S(a;' div, [j])(t, x) — curl Sfm) (r, x) 
H(t, x) = —eS(3,[m])(t, x) 


+ - grad, Sar! div, [m])(¢, x) + curl S(j], x) 
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where [j] and [m] are the surface currents and surface 
charge densities given by the jumps across X: 


G]=(HAn); [m]={n AE] 


and 47"! is the primitive defined by 


t 
alol, x) a] p(s, x) ds 
0 


Taking tangential traces on E, one then obtains systems 
of integral equations analogous to (D1)—(N4) for the 
unknown surface current and charge densities. Owing to 
special symmetries of the Maxwell equations, the set of 
four boundary integral operators V, K, K’, W appearing in 
the boundary reduction of second-order problems is reduced 
to only two different boundary integral operators that we 
denote by V and K, defined by 


1 a 
Vo=-nAs (Fae) + curl, S(c3; 9) 


Kọ = sy +y nA curl S(9) 


In the definition of K, one takes the principal value that also 

corresponds to the mean value between the exterior trace 

y* and the interior trace y~, analogous to the definition of 

the double-layer potential operator K in Section 2.2.1. 
For the exterior initial value problem, the traces 


vem=nABand @=ycj= ("Han 


then satisfy the two relations corresponding to the four 
integral equations (D1), (D2), (N1), (N2) of the direct 
method 


1_K)v=-Ve and (}-K)ọ=Vv 


From a single-layer representation, that is, [m] = 0 in the 
representation formula for the electric field, one obtains the 
time-dependent electric field integral equation, which can 
now be written as 


Vy=8 


where g is given by the tangential component of the incident 
field (see Chapter 26, this Volume. 


2.3 Space-time variational formulations and 
Galerkin methods 


We will not treat the analysis of second-kind boundary 
integral equations in detail here. Suffice it to say that the 


key observation in the parabolic case is the fact that for 
smooth T', the operator norm in LP (X) of the weakly sin- 
gular operator K tends to 0 as T —> 0. This implies that 
(1/2) +K and (1/2) + K’ are isomorphisms in L? (and 
also in C™), first for small T and then by iteration for 
all T. The operators K and K’ being compact, one can 
use all the well-known numerical methods for classical 
Fredholin integral equations of the second kind, includ- 
ing Galerkin, collocation, Nystrém methods (Pogorzelski, 
1966; Kress, 1989), with the additional benefit that the 
integral equations are always uniquely solvable. If I has 
corners, these arguments break down, and quite differ- 
ent methods, including variational arguments, have to be 
used (Costabel, 1990; Dahlberg and Verchota, 1990; Brown, 
1989; Brown and Shen, 1993; Adolfsson et al., 1994). 


2.3.1 Galerkin methods 


For the first-kind integral equations, an analysis based on 
variational formulations is available. The corresponding 
numerical methods are space-time Galerkin methods. Their 
advantage is that they inherit directly the stability of the 
underlying variational method. In the elliptic case, this 
allows the well-known standard boundary element analysis 
of stability and errors, very similar to the standard finite 
element methods. In the parabolic case, the situation is still 
similar, but in the hyperbolic case, some price has to be paid 
for the application of ‘elliptic’ techniques. In particular, one 
has then to work with two different norms. 

Let X be some Hilbert space and let a be a bilinear form 
on X x X. If we assume that a is bounded on X: 


JM :Vu,v eX: la(u,v)| < M lulilo 
but that a is elliptic only with respect to a smaller norm 
ll- llo, associated with a space Xg into which X is continu- 
ously embedded: 

Ja>0:Yu eX : ja(u,u)| > a llulĝ 
then for the variational problem, find u € X such that 
alu, v) = (f, v) Vex 
and for its Galerkin approximation, find uy € X y such that 
aly, Vy) = (f, Yy) Voy E€ Xy 

there are stability and error estimates with a loss 


lu yllo < C lull 


and lu — uyllo < C inf{llu — vyl luy € Xy} 
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The finite-dimensional space X y for the Galerkin approx- 
imation of space-time integral equations is usually con- 
structed as a tensor product of a standard boundary ele- 
ment space for the spatial discretization and of a space of 
one-dimensional finite element or spline functions on the 
interval [0, T] for the time discretization. Basis functions 
are then of the form 


ijt, x) = XO); &) PS heady Jie dF 


and the trial functions are of the form 


IJ 
un (t, x)= > 0; Pij (é, x) 


ij=1 


The system of Galerkin equations for the unknown coeffi- 
cients a; is 


rd 
oe a; Peg); =(f, Py) 


i,jel 
(Cal iol, Fa 25509 


In the following, we restrict the presentation to the 
single-layer potential operator V. We emphasize, however, 
that a completely analogous theory is available for the 
hypersingular operator W in ali cases. 

The variational methods for the first-kind integral opera- 
tors are based on the first Green formula that gives, together 
with the jump relations, a formula valid again for all three 
types of equations: If ọ and y are given on I or ©, satisfy 
a finite number of conditions guaranteeing the convergence 
of the integrals on the right-hand side of the formula (2) 
below, and 


u = Sq, v= SẸ 
then 


fovva=f {Vu : Vv +u Av}dr (2) 
r R\r 


2.3.2 (E) 
For the elliptic case, we obtain ({-,-)- denotes L? duality 
on T); 
(0, Vole = [Val = atu) de 
Rr 


This gives the following theorem that serves as a model 
for the other two types. It holds not only for the simple 
case of the Laplacian but also, in particular its assertion 


ii), for more general second-order systems, including the 
Lamé system of linear elasticity (Costabel, 1988). 


Theorem 1. Let T be a bounded Lipschitz surface, open 
or closed. H? (T) and H—'/2(L) denote the usual Sobolev 
spaces, and H~/*(P) for an open surface is the dual of 
H'/2(L). Then 


(i) For o=0, n>=3: V: A-T) > AYP) is an 
isomorphism, and there is an œ > 0 such that 


le Vor = al oll-sacry 


(ii) For any œ and n, there is an a > 0 and a compact 
quadratic form k on H"? (T) such that 


Relg, Vo)p = alle vagy kO) 


(iii) If œ is not an interior or exterior eigenfrequency, 
then V is an isomorphism, and every Galerkin method 
in H7'/2(L) for the equation Vip = g is stable and 
convergent. 


2.3.3 (P) 


For the parabolic case of the heat equation, integration over 
t in the Green formula (2) gives 


T 
wvo=f f {|V,u(t, x)|? + 8,uu} dx de 
0 JR" 


= ff ivue,oParars f lu(T, x)|? dx 
p 


From this, the positivity of the quadratic form associated 
with the operator V is evident. What is less evident is the 
nature of the energy norm for V, however. It turns out 
(Arnold and Noon, 1989; Costabel, 1990) that one has to 
consider anisotropic Sobolev spaces of the following form 


F; (E) = L20, T; A) NO, T; LT) 


The index 0 indicates that zero initial conditions at t = 0 are 
incorporated. The optional ~ means zero boundary values 
on the boundary of the (open) manifold T. One has the 
following theorem, which is actually simpler than its elliptic 
counterpart because the operators are always invertible due 
to their Volterra nature. 


Theorem 2. Let T be a bounded Lipschitz surface, open 
or closed, n > 2. 


(i) Vi BOATS) > HEP UOCE) is an isomor- 
phism, and there is an a > 0 such that 


(Vos allel?,174 
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(ii) Every Galerkin method in HO?“ (x) for the 
equation Vy = g converges. The Galerkin matrices 
have positive-definite symmetric part. Typical error 
estimates are of the form 


I = Opell (—1/2,--1/4) 
< C (WU) 4 OPO) | Ol, 0 


if pp is the Galerkin solution in a tensor product 
space of splines of mesh-size k in time and finite 
elements of mesh-size h in space. 


2.3.4 (H) 


For the wave equation, choosing g=¥W in the Green 
formula (2) does not give a positive-definite expression. 
Instead, one can choose ọ = ERVA This corresponds to the 
usual procedure for getting energy estimates in the weak 
formulation of the wave equation itself where one uses ĝ,u 
as a test function, and it gives 


y 
(3,0, Vos = f [ {8,V, i - Vu + 0,02u} dx dt 
0 JR'\V 
= 5 f, UVT, DP BaT, Ph a 
2 Jar 


Once again, as in the elliptic case, this shows the close 
relation of the operator V with the total energy of the 
system. In order to obtain a norm (H 1(Q)) on the right- 
hand side, one can integrate a second time over t. But in 
any case, here the bilinear form (8,9, Vy) s will not be 
bounded in the same norm where its real part is positive. 
So there will be a loss of regularity, and any error estimate 
has to use two different norms. No ‘natural’ energy space 
for the operator V presents itself. 


2.4 Fourier-Laplace analysis and Galerkin 
methods 


A closer view of what is going on can be obtained using 
space-time Fourier transformation. For this, one has to 
assume that T is flat, that is, a subset of R’~'. Then all 
the operators are convolutions and as such are represented 
by multiplication operators in Fourier space. If I is not 
flat but smooth, then the results for the flat case describe 
the principal part of the operators. To construct a complete 
analysis, one has to consider lower order terms coming 
from coordinate transformations and localizations. Whereas 
this is a well-known technique in the elliptic and parabolic 
cases, namely, part of the calculus of pseudodifferential 


operators, it has so far prevented the construction of a 
completely satisfactory theory for the hyperbolic case. 

We denote the dual variables to (t,x) by (œ, &), and x’ 
and &’ are the variables related to I C R”—1, It is then easily 
seen that the form of the single-layer potential is 


FFE) = he? — 0) GE) © 
Filo, 8) = HIEP — io) ilo, E) P 
Vi, 5) = LEP- oio (A) 


Note that (£) and (H) differ only in the role of œ; for (€), 
it is a fixed parameter, for (H), it is one of the variables, 
and this is crucial in the application of Parseval’s formula 
for (p, V9). 


24.1 (E). 


For the elliptic case, the preceding formula implies Theo- 
rem 1: If œ = 0, then the function (1/2)|&’|7! is positive 
and for large |Ẹ'] equivalent to (1 + |é’?)71/?, the Fourier 
weight defining the Sobolev space H7/?(I). If œ #0, 
then the principal part (as |&’] > 00) is still (1/2/8171, 
so only a compact perturbation is added. There is an addi- 
tional observation by Ha-Duong (1990): If œ is real, then 
(1/2)({8'|? — w?) is either positive or imaginary, so its 
real part is positive except on the bounded set |§’| < |]. 
This implies 

Proposition 1. Let w° > 0, T flat, supp 9 compact. Then 


there is an a(w) > 0 such that 


Relg, Vor = 00) olla 


The work of transforming this estimate into error esti- 
mates for the BEM in the hyperbolic case is still incomplete. 
See Ha-Duong (2003) for a review of the state of the art 
on this question. 


2.4.2 (P) 


For the parabolic case, the symbol of the single-layer 
potential, 


oy (eo, 8) = Ale’? — iw)? 


again has a positive real part. In addition, it is sectorial: 


Tr 
largo (o, E) SZ 
This has the consequence that its real part and absolute 
value are equivalent (an ‘elliptic’ situation): 


©, EÊ- to"? < Reoy(w, $) < G EP- io 
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In addition, for large JE? + ||, this is equivalent to 
(1+18'P)+ lol)”, the Fourier weight defining the 
space H‘-!/2(-1/4)(3), This explains Theorem 2. It also 
clearly shows the difference between the single-layer heat 
potential operator on the boundary and the heat operator 
ð, — A itself: The symbol of the latter is |£]? — iw, and the 
real part |§|? and the absolute value (|&|* + |o|”)'/2 of this 
function are not equivalent uniformly in § and œ. 


2.4.3 (H) 


In the hyperbolic case, the symbol oy does not have a 
positive real part. Instead, one has to multiply it by i@ and 
to use a complex frequency w = we+iw, with œ; > 0 
fixed. Then, one gets 


Re (i@(['|? — @)!) > Shae? + loj? 


and similar estimates given first by Bamberger and Ha 
Duong (1986). Note that with respect to |w], one is losing 
an order of growth here. For fixed @,, the left-hand side 
is bounded by |«|?, whereas the right-hand side is O(|wl). 
One introduces another class of anisotropic Sobolev spaces 
of the form 


H (R xT) = {u | u, Fu € HR x T)} 


with the norm 


Wells ray 


= | Í na OP IE + oD i, E)? db! do 


In@=w; 
We give one example of a theorem obtained in this way. 
Theorem 3. Let I be bounded and smooth, r,s € R. Then 


(i) Vi AZE) > HEt (E) and 
Vl: AStirtleyy -> HY (©) are continuous. 

(ii) Let », > 0 and the bilinear form a(ọ, Y) be defined 
by 


oo 
alp 1p) = f eel Í VAG, x) IPCE, x) do(x) dt 
Then there is an a > 0 such that 


Realo, 9) > 40, Pl i/d, 


(üi) The Galerkin matrices for the scheme: Find gy € Xy 
such that 


alpy, Y) = (2, b)s YF € Xy 


have positive-definite hermitian part, and there is an 
error estimate 


le ~ Pxlli1/2,00r 


ETA 
Co, at le = Wl 1/2, 1,0, 


Thus, one has unconditional stability and convergence for 
®, > 0, In practical computations, one will use the bilin- 
ear form a(o, y) for w = 0, where the error estimate is 
no longer valid. Instabilities have been observed that are, 
however, probably unrelated to the omission of the expo- 
nential factor. They are also not caused by a too large CFL 
number (ratio between time step and spatial mesh width). 
In fact, too small and too large time steps have both been 
reported to lead to instabilities. 

Corresponding results for elastodynamics and for electro- 
dynamics can be found in the literature (besides the above- 
mentioned works, see the references given in Chudinovich, 
2001 and in Bachelot et al., 2001). 


2.5 Collocation methods 


In order to avoid the high-dimensional integrations nec- 
essary for the computation of the matrix elements in a 
Galerkin method such as the ones described in Theorems 2 
and 3, one often uses collocation methods. Just like in the 
elliptic case, even for the classical first-kind integral oper- 
ators for the Laplace operator, the mathematical analysis 
lags seriously behind the practical experiences. 

In more than two dimensions, only for very special 
geometries that are amenable to Fourier analysis, stability 
of collocation schemes can be shown. For time-dependent 
integral equations, even two-space dimensions create prob- 
lems that only recently have been overcome, and this is only 
for special geometries, mainly flat boundaries or toroidal 
boundaries. 

Collocation schemes for the single-layer potential inte- 
gral equation (D3) are easy to formulate. One usually takes 
basis functions of tensor product form, that is, 


ij, x) = KOLE) 


where x;(i=1,...,M) is a basis of a space of finite 
elements (splines) of degree d, on the interval [0, T], and 
WG = 1,..., N) is a basis of a space of finite elements 


of degree d, on the boundary I’. Then, the trial functions 
are of the form 
M,N 


ugn (t, x) = T Qij py, x) 


ij=l 
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Here, the indices kh indicate the time step k ~ T/M and 
the mesh width h of the discretization of the boundary T. 

The linear system for the unknown coefficients is 
obtained from the equations 


Vy, 8, xj) = 8G x;) 


where t, € [0, T] = 1,..., M) are the time collocation 
points and x; € P = 1, ..., N) are the space collocation 
points, The collocation points are usually chosen in a 
‘natural’ way, meaning midpoints for even degree splines 
jn time, nodes for odd degree splines in time, barycenters 
for piecewise constants d, = 0, nodes of the finite element 
mesh on T for d,=1, and, more generally, nodes of 
suitable quadrature rules for other values of d,. 


2.5.1 (P) 


For the heat equation in a smooth domain in two-space 
dimensions, it was shown in Costabel and Saranen (2000, 
2003) that for d, = 0, 1, one gets convergence in anisotropic 
Sobolev spaces of the ‘parabolic’ class defined in subsection 
2.3.3. There is a condition for optimality of the convergence 
that corresponds to a kind of anisotropic quasiuniformity: 


k~h 


2.5.2 (H) 


For the retarded potential integral equation, that is, the equa- 
tion of the single-layer potential for the wave equation in 
three-space dimensions, Davies and Duncan (2003) prove 
rather complete stability and convergence results for the 
case of a flat boundary. 


3 LAPLACE TRANSFORM METHODS 


To pass from the time domain to the frequency domain, we 
define the (Fourier-) Laplace transform by 


Tw) = Lu(w) = [ aao u(t) dt 3) 


If u is integrable with a polynomial weight or, more 
generally, a tempered distribution, and if, as we assume 
here throughout, u(t) = 0 for t < 0, then # is holomorphic 
in the upper half plane {w = wg +im, | Oz € R, w > 0}. 
The inversion formula is 


oo+iwr 


"Era = 5 f Nodo (H 


0+0; 


Frequently, it is customary to define the Laplace integral 


by 
f e u(t) dt 
0 


which is the same as (3) when s and œw are related by 
s = —iw. The upper half plane w; > 0 coincides with the 
tight half plane Res > 0. 

The function £t > u(t) can take values in some Banach 
space (Arendt et al., 2001), for example, in a space of 
functions depending on x, in which case we write 


20 
Tlo, x) = Lulo, x) = f el! u(t, x) dt 
0 


By Laplace transformation, both the parabolic and the 
hyperbolic initial-boundary value problems are transformed 
into elliptic boundary value problems with an eigenvalue 
parameter à depending on the frequency œ. Thus, both 
the heat equation (ð, — A)u =0 and the wave equa- 
tion (c-?4? — A)u = 0 are transformed into the Helmholtz 
equation (A — i)i(w, x) = 0, where 


Aw) = —iw for the heat equation 


2 
and Mo) = 2e for the wave equation 
c 


The idea of the Laplace transform boundary integral 
equation method is to solve these elliptic boundary value 
problems for a finite number of frequencies with a standard 
boundary clement method and then to insert the results 
into a numerical approximation of the Laplace inversion 
integral (4). 

There exist various algorithms for numerical inverse 
Laplace transforms (see e.g. Davies and Martin, 1979 
or Abate and Whitt, 1995). One will, in general, first replace 
the line of integration {Im œ = ,} by a suitable equiv- 
alent contour C and then choose some quadrature rule 
approximation of the integral. The end result will be of 
the form 


E 
Lf à EE 
= — miot Tl da ~ —iwet 5 
u(t) == fe (a) de 2 we alo (5) 


with quadrature weights w, and a finite number of frequen- 
cies œ. 

One obvious candidate for such a quadrature formula is 
the trapezoidal rule on a large interval [—R, R], where the 
line {Im w = ,} is replaced by [—R, R] + iwp. This can 
then be evaluated by fast Fourier transform, which is clear 
when we write the Laplace inversion integral as inverse 
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Fourier transform over the real line: 
u(t) = LRE) =e" Fe), Lalo, + io) 


Let us describe the resulting procedure in more detail for 
the formulation with a single-layer potential representation 
for the initial-Dirichlet problem, keeping in mind that 
any other type of boundary integral equation constructed 
in Section 2.2.2 would do as well and lead to a similar 
formalism. By Laplace transform, we get the boundary 
value problem 


(A — X(@))#(o,x) =0 in Q 


Tw, x) = B(o,x) onr 


where the right-hand side is the Laplace transform of the 
given boundary data g. For the unknown density 4, we get 
the first-kind integral equation on T° 


Vy ¥(@) = Fo) © 


where V, (w) is the weakly singular integral operator gener- 
ated by convolution with the kernel (in three dimensions) 


ev Neila! 


Gx) (1) = Ax|x| 


Now, let V go), be some finite-dimensional boundary ele- 
ment approximation of V, gw)» so that 


Tro) = Vib) 8) 


is the corresponding approximate solution of equation (6). 
Inserting this into the numerical inversion formula (5) 
finally gives the following expression for the approximation 
of the unknown density (f, x) via the Laplace transform 
boundary element method 


L 
Ulex) = Yo we (Vi, BOY) @ O 
f=1 


Note that on this level of abstraction, formula (7) looks 
the same for the parabolic case of the heat equation, the 
hyperbolic case of the wave equation, or even the dissi- 
pative wave equation. The only difference is the function 
Aw), which then determines, depending on the contour C 
and its discretization œp, for which complex frequencies 
/(—)(@,)) the single-layer potential operator has to be 
numerically inverted. 

For practical computations, this difference can be essen- 
tial. In a precise quadrature rule in (5), which is needed 
for high resolution in time, there will be some w, with 


large absolute values. In the hyperbolic case (but not in the 
parabolic case}), this means large negative real parts for 
}(w,), hence highly oscillating kernels, and some machin- 
ery for high-frequency boundary element methods has to 
be put in place (see e.g. Bruno, 2003). 

Applications of the Laplace transform boundary inte- 
gral equation methods in elastodynamics have a long his- 
tory (Cruse and Rizzo, 1968; Cruse, 1968). For general- 
izations such as viscoelasticity, poroelasticity, or piezo- 
electricity, these methods are more practical than the 
space-time boundary integral equation methods because 
space-time fundamental solutions are not explicitly known 
or very complicated (Gaul and Schanz, 1999; Schanz, 
1999; Schanz, 2001a; Wang et al., 2003). Recently, Laplace 
domain methods related to the operational quadrature 
method (see subsection 4.4) have been used successfully in 
practice (Schanz and Antes, 1997a, b; Schanz, 2001b; Telles 
and Vera-Tudela, 2003). 

A final remark on the Laplace transform boundary ele- 
ment method: Instead of, as described in this section, per- 
forming first the Laplace transform and then the reduction 
to the boundary, one can also first construct the space-time 
boundary integral equations as described in the previous 
section and then apply the Laplace transform. It is easy to 
see that the resulting frequency-domain boundary integral 
equations are exactly the same in both procedures, 


4 TIME-STEPPING METHODS 


In the previous sections, the boundary reduction step was 
performed before any discretization had taken place. In 
particular, the description of the transient behavior of the 
solution by a finite number of degrees of freedom was 
introduced via a Galerkin or collocation method for the 
space-time integral equation or via numerical Laplace inver- 
sion, only after the construction of the boundary integral 
equation. 

It is possible to invert the order of these steps by first 
applying a time-discretization scheme to the original initial- 
boundary value problem and then using a boundary integral 
equation method on the resulting problem that is discrete 
in time and continuous in space. One advantage of this 
idea is similar to the motivation of the Laplace transform 
method: The parabolic and hyperbolic problems are reduced 
to elliptic problems for which boundary element techniques 
are well known. Another attraction is the idea that once 
a procedure for one time step is constructed, one can 
march arbitrarily far in time by simply repeating this same 
procedure. 

In this section, we will, for simplicity, only treat the 
parabolic case of the initial-Dirichlet problem for the heat 
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equation. Quite analogous procedures are also possible for 
the hyperbolic case, and, in particular, the operational 
quadrature method has been analyzed for both the parabolic 
and the hyperbolic situation (see Lubich, 1994). In the lit- 
erature on applied boundary element methods, one can 
find many successful applications of similar time-stepping 
schemes to parabolic and hyperbolic problems of heat 
transfer, fluid dynamics, elastodynamics, and various gen- 
eralizations (Nardini and Brebbia, 1983; Partridge et al., 
1992; Gaul et al., 2003). 


4.1 Time discretization 


We consider the initial-boundary value problem 


(ð, — Ajut, x)=0 ing (8) 
u=g ond 


u(t,x)=0 fort <0 


as an ordinary differential equation in time with opera- 
tor coefficients. Consequently, we can employ any kind of 
one-step or multistep method known from the numerical 
analysis of ordinary differential equations. Only implicit 
schemes are of interest here, for two reasons: The first rea- 
son is the stability of the resulting scheme, and, secondly, 
explicit schemes would not really require a boundary inte- 
gral equation method. 

The solution u(t, x) for O < t < T is approximated by a 
sequence u” (x), n =0,...,N, where 


u” is understood as an approximation of u(t,, *), 
nT 
t snk = Fo 


The simplest discretization of the derivative 3, with time 
step k is the backward difference that gives the backward 
Euler scheme for (8) 


u" — ut} 
k 
u” (x) = g” (x) = g(t,,x) onl (n=1,...,N) 


w=0 fort <0 


—Au™=0 ink (nm=1,..., N) (9) 


The actual elliptic boundary value problem that one has to 
solve at each time step, n =1,..., N is therefore 


u” — kAu” =u" inQ: um =g" onr (10) 


Higher-order approximation in time can be achieved by 
multistep methods of the form 


fig r 
Xau -kY BAw" =0inQ; u®=g" onr 
j=0 j=0 


(11) 
The coefficients a; and $, define the characteristic function 
of the multistep scheme 


£ 
days 
j=0 


O=” 


S Bt 
j=0 


Consistency of the scheme (11) is characterized by 8(1) = 
0, ¥(1) = —1, and the scheme is accurate of order p 
if 8(e*)/z = 14 O(z?) as z —> 0. One can assume that 
ApBg > 0. 


4.2 One step at a time 


The problem to solve for one time step in both (10) and 
(11) is of the form 


wWu-Au=f inQ; u=g onl (12) 


Here, n? = 1/k for (10) and n? = ap/(kBp) for (11). The 
right-hand side f is computed from the solution of the pre- 
vious time step(s), and it has no reason to vanish except 
possibly for the very first time step. For the integral equa- 
tion method, we therefore have to apply a representation 
formula that takes into account this inhomogeneous differ- 
ential equation. 

Let u, = Pf be a particular solution of the equation 
nu, — Au, =f in Q. Then, uy=u—u, satisfies the 
homogeneous equation and can therefore be computed by a 
standard boundary integral equation method, for example, 
by one of the methods from Section 2.2, For an exterior 
domain, we thus have the representation formula in & 


Ug (x) = [tws — yuo (y) — G(x — y)4,Uo(y)} do(y) 
= Dyou) (x) — S(y 49) x) 


Here, G is the fundamental solution of the Helmholtz 
equation given in the three-dimensional case by 


eal 


Cœ = Tali 
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Using our abbreviations for the single- and double-layer 
potentials and 


You =Ulp yu = O,u|p 
we have the representation for u 


u= D you) = S(y,4) + Pf => DPS) + Sy, Pf) 

(13) 
For the unknown ọ = y,u in the direct method for the 
Dirichlet problem or for the unknown w in a single-layer 
potential representation 


u=Sh+Pf (14) 
or the unknown w in a double-layer representation 
u=Dw+Pf (15) 
this leads to the choice of integral equations 


Vo=(—$+K)g+(4 — K)yoPf + Vy, Pf 


D1) 
(5+ K'o =-Wg + WyoPf +(- 4 +K Pf 

(D2) 

Vp =e -YoPf (D3) 

(5 + K)w =g—yoPf (D4) 


These are standard boundary integral equations that can be 
discretized and numerically solved in many ways. The one 
peculiarity is the appearance of the particular solution Pf in 
the representation formula (13) and in the integral equations 
(D1)-(D4). ý 

There are various possibilities for the construction of 
(an approximation of) Pf. Let us mention some of them 
that are being used in the boundary element literature and 
practice. 


4.2.1 Newton potential 


In the standard representation formula for the inhomoge- 
neous Helmholtz equation derived from Green’s formula, 
Pf appears in the form 


Pf@)= Í Gæ- y) FO) dy 


This representation has the advantage that the last two terms 
in the representation formula (13) cancel, and therefore also 
the integral equations (D1) and (D2) simplify in that the 
integral operators acting on the traces of Pf are absent. 


For computing the Newton potential, the domain Q has to 
be discretized, thus neutralizing one of the advantages of 
the boundary element method, namely the reduction of the 
dimension. Note, however, that this domain discretization 
is done only for purposes of numerical integration. No finite 
element grid has to be constructed. It is also to be noted that 
the domain discretization only enters into the computation 
of the right-hand side; the size of the linear system to be 
solved is not affected. 


4.2.2 Fourier series 


Another method to get an approximate particular solution 
Pf is to embed the domain & into a rectangular domain 
©, then approximate an extension of f to Q by trigono- 
metric polynomials using fast Fourier transform, solve the 
Helmholtz equation in Fourier space, and go back by FFT 
again. Other fast Helmholtz solvers that exist for simple 
domains can be used in the same way. 


4.2.3 Radial basis functions 


In the previous subsections, the right-hand side f was 
approximated by a linear combination of special functions 
for which particular solutions of the Helmholtz equation 
are known: the Dirac distribution for the Newton potential 
method, and exponential functions for the FFT method. The 
particular solution Pf is then given by the corresponding 
linear combination of the individual particular solutions. 
Other special functions that can serve in the same way are 
radial basis functions, in the simplest case functions of the 
form |x — x;], where the x; belong to some discretization 
of Q by an unstructured grid. One advantage of the radial 
basis function technique is that there exist many practical 
and theoretical results about interpolation by such functions 
(Powell, 1992; Faul and Powell, 1999). 


4.2.4 Higher fundamental solutions 


In the first time step, the solution u = u! is given, after 
solving the appropriate boundary integral equations, by 
the representation formula (13) with f = 0, that is, by a 
combination of single- and double-layer potentials. This u! 
is then used as right-hand side f in the next time step. 
A particular solution Pf can then be found, without any 
domain integral, by replacing the fundamental solution G 
of (n? — A) in the representation formula by a fundamental 
solution G® of (n? — A)? satisfying 


M- AG (x) = Ga) 
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Thus, if 

f@)= Í (8g) — yw(y) — GO -yp day) 
then a particular solution Pf is given by 
Pf(x)= i {8x PE — ywo) — GPG — yg} doy) 


In the next time step, the right-hand side is then con- 
structed from single- and double-layer potentials plus this 
Pf. Repeating the argument, one obtains a particular solu- 
tion by using a fundamental solution G® of (n? — A)>. In 
the nth time step, one then needs to use higher-order funda- 
mental solutions GY), (j < n), which satisfy the recurrence 
relations 


m- AGUDE) = GO (x) 


Such functions GY can be given explicitly in terms of 
Bessel functions. In this way, the whole time-marching 
scheme can be performed purely on the boundary, with- 
out using domain integrals or any other algorithm requiring 
discretization of the domain Q. Two other points of view 
that can lead, eventually, to an entirely equivalent algo- 
rithm for the time-discretized problem, are described in the 
following sections. 


4.3 All time steps at once 


Just as in the construction of the space-time integral equa- 
tions the heat equation or wave equation was not considered 
as an evolution equation, that is, as an ordinary differen- 
tial equation with operator coefficients, but as a translation 
invariant operator on R'*” whose fundamental solution was 
used for integral representations, one can consider the time- 
discretized problem as a translation invariant problem on 
Z x R” and construct a space-time fundamental solution for 
this semidiscretized problem. The role of the time derivative 
is then played by its one-step or multistep discretization, 
as in (10) or (11), and the role of the inverse of the time 
derivative and of other finite time convolutions appearing in 
the space-time integral operators is played by finite discrete 
convolutions. 

In simple cases, such discrete convolution operators can 
be inverted explicitly. For a two-part recurrence relation, 
such as the backward Euler method (9), the convolution 
operator can be represented by a triangular Toeplitz matrix 
with just one lower side diagonal. Let u denote the vector 
u!,...,u% and define g correspondingly. Then the back- 
ward Euler scheme (10) can be written as a system 


Au=0 inQ; usg onl (16) 


Here, A is an elliptic system of second order, given by the 
matrix elements 


se a 1-kA; aj" —1; all other a; ; = 0 


Once a fundamental solution of this system is found, the 
system of equations (16) can be solved numerically by 
standard elliptic boundary element methods. Because of 
the simple form of A, such a fundamental solution can be 
written using the higher fundamental solutions GY) of the 
Helmholtz equation defined in Section 4.2.4. It is a lower 
triangular Toeplitz matrix © with entries (G; ;), where 


G; jŒ) = G(x) 
Gj) =GEP for j <i 
G; j) =0 forj <i 


All boundary integral operators constructed from this funda- 
mental solution will have the same lower triangnlar Toeplitz 
(finite convolution) structure, and their solutions can be 
found by inverting the single operator that generates the 
diagonal and by subsequent back substitution. 

For a detailed description of the approximation of the 
two-dimensional initial-Dirichlet problem for the heat equa- 
tion using such a method, including formulas for the ker- 
nels in 6 and a complete error analysis of the resulting 
second-kind integral equation as well as numerical results, 
see Chapko and Kress (1997). 


4.4 The operational quadrature method 


In the previous section, the simple structure of the back- 
ward Euler scheme was essential. The resulting numerical 
approximation is of only first order in time. If one wants 
to use schemes that are of higher order in time, one can 
employ multistep methods, as described above. The result- 
ing schemes still have the lower triangular Toeplitz structure 
of finite discrete convolutions in time. From the algebraic 
structure of these convolutions, it is clear that fundamen- 
tal solutions, the resulting boundary integral operators, and 
their solution operators all have this finite convolution struc- 
ture. 

Explicit constructions of kernels, however, will not be 
possible, in general. Just as for the original continuous- 
time problem the appropriate functional transform — the 
Laplace transform — allowed the reduction of the parabolic 
to elliptic problems, here, for the discrete-time problem, 
one can use the appropriate functional transform — namely 
the z-transform. In order to conserve the approximation 
order of the multistep method, one has to use a certain 
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translation between continuous convolutions and discrete 
convolutions, or, equivalently, between Laplace transforms 
and z-transforms. 

For the generators of the convolution algebras, namely, 
the derivative 9, in the continuous case and its time step k 
discretization 9*, this translation is given by the definition 
(11) of the multistep method, characterized by the rational 
function 8(z). For the whole convolution algebras, this 
translation leads to the discretization method described 
by Lubich’s operational quadrature method, see Lubich 
and Schneider (1992) and Lubich (1994). The general 
translation rule is the following (we use our notation for the 
(Fourier-)Laplace transform introduced above, not Lubich’s 
notation): 

Denote a finite convolution operator with operator-valued 
coefficients by 

KGd)uf) = Ly} EONO] 
If Elo) decays sufficiently rapidly in the upper half plane, 
this operator is given by an integrable kernel K whose 
Laplace transform is K(w): 


x t 
K (id, u(t) =f K (s) u(t — s) ds 


The corresponding discrete convolution operator is given 
by 


Raun = J Kj un; 


j=0 


where the coefficients K, are defined by their z-transform 


` Xx pe R(2) 


Here, k is the time step and 8(z) is the characteristic 
function of the multistep method. The inverse of the z- 
transform is given by the Cauchy integral over some circle 


[z| =p 
1 => 22) rS 
iene K(i—~)2/ 4 
5 = oa im (: St ies 


It is not hard to see that this translation rule reduces, for 
the case of the derivative 3, = K(ið,) with K (œ) = —iw, 
to the convolution defined by the characteristic function 
è(z): 


n co 
ku, 5: jun; with 8(z) = + zi 
j=0 j=0 


In addition, this translation rule is an algebra homomor- 
phism, that is, it respects compositions of (operator-valued) 
convolution operators. This is easy to see because 


R aa) Ra) = (R RGD 
andalso R Gah R Ga = (K,K,)(ia) 


By the relation z = e'“*, one can see the analogy between 
the Cauchy integral over |z] = const with measure z~/—! dz 
and the Laplace inversion integral for the time t = = jk 
over Imo = const with measure e7# de, 

This operational quadrature method can be applied at 
several different stages of an integral equation method for 
the time-discretized initial value problem. 

It can be used to find a fundamental solution for the 
whole system in the form of a Cauchy integral over the 
frequency-domain fundamental solutions G,,. We get for 
the coefficients G, of the semidiscrete space-time funda- 
mental solution SCI) the formula 


G)@) = = 3 Guia (a) 7 de with o0 = 2 


Ini 


This integral over holomorphic functions can be evalu- 
ated numerically with high speed and high accuracy using 
the trapezoidal rule and FFT. In simple cases, it can be 
evaluated analytically, for example, in the case of the back- 
ward Euler method, where we have the simple characteristic 
function 


&(z) =1-z 


The Cauchy integral then gives the higher-order fundamen- 
tal solutions GY of the previous section. 

This fundamental solution 6(*) can then be used in 
a standard boundary element method, keeping in mind 
that the time-discretized solution will be obtained by finite 
convolution. 

The operational quadrature scheme can also (and equiva- 
lently) be introduced at a later stage in the integral equation 
method, after the frequency-domain integral equations have 
been solved. Let us describe this with the example of the 
single-layer representation method for the initial-Dirichlet 
problem of the heat equation. 

The space-time single-layer heat potential operator on © 
can be written as V = Fa ;), where F (œw) is the frequency- 
domain single-layer potential operator on I whose kernel 
is the fundamental solution of the Helmholtz operator 
(im — A). Inverting V amounts to evaluating the Cauchy 
integral of the inverse z-transform where the frequency- 
domain single-layer integral equations have been solved for 
those frequencies needed for the Cauchy integral. For the 
approximation yf, of the solution yy(z,) at the time t, = nk 
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with time step k and a space discretization V, (œ) of V (œ), 
one then obtains 


1 Seoni e ; 
W= Í E (2) Lesa) a a7 
z= ja 


This can be compared to the Laplace inversion integral 
(5) where the contour C is the image of the circle jz} = p 
under the mapping z => œ = i(8(z)/k). When the Cauchy 
integral in (17) is evaluated numerically by a quadrature 
formula, we obtain an end result that has a form very 
similar to what we got from the Laplace transform boundary 
element method in formula (7). 

In the papers, Lubich and Schneider (1992) and Lubich 
(1994), the operational quadrature method has been ana- 
lyzed for a large class of parabolic and hyperbolic initial- 
boundary value problems and multistep methods satis- 
fying various stability conditions. Recent computational 
results show its efficiency in practice (Schanz and Antes, 
1997a,b). 
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1 MAXWELL EQUATIONS 


We shall discuss the simplest (linear, isotropic) version of 
Maxwell’s equations. Given a domain 2 C R?, we wish to 
determine electric field E(x) and magnetic field H(x) that 
satisfy: 


e = Faraday’s law (1831), 
VxE= me, H) 
aT 


e <Ampere’s law (1820) with Maxwell’s correction 
(1856), 


VxH= J" 408+ 2 (eb) 


Encyclopedia of Computational Mechanics, Edited by Erwin 
Stein, René de Borst and Thomas J.R. Hughes. Volume 1: Funda- 
mentals. © 2004 John Wiley & Sons, Ltd. ISBN: 0-470-84699-2. 


Here u, o, € denote the material data: permeability, 
conductivity, and permittivity, assumed to be piecewise 
constant. B := pH is the magnetic flux, D := «E is the 
electric flux, o£ is the Ohm current, and J'™? denotes a 
prescribed, given impressed current, with J := J™? + cE 
identified as the total current. 

Once the total current J has been determined, we can 
use the continuity (conservation of free charge) equation to 
determine the corresponding free charge density p, 


ôt 
The first-order Maxwell system is accompanied with initial, 
boundary, and interface (across material discontinuities) 
conditions. 
Taking the divergence of both sides in Faraday’s equa- 
tion, we learn that the magnetic field H has to satisfy 
(automatically) the equation, 


a 
5 (VUE) =0 


Assuming that the initial value H (0) satisfies the Gauss law 
for the magnetic flux, 


Vo(uH) =0 


we learn that the law is satisfied automatically for all 
times f. 

Similarly, taking the divergence of both sides of the 
Ampere equation, and utilizing the continuity law, we learn 
that the electric field E satisfies, 


a 
ap (YE) — 0) = 0 
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Assuming that the initial value Æ (0) satisfies the Gauss law 
for the electric flux, 


VOCE) =p 


we learn that the electric field satisfies the law at all times t. 
In the steady state, the Maxwell system degenerates and 
decouples, 


VxE=0 
VxH=J™+oE 


The closing equation for electrostatics is provided either by 
the continuity equation or by the Gauss law. In the case of 
a perfect dielectric, o = 0, the distribution of free charge 
p must be prescribed. We can determine the corresponding 
electric field by solving the electrostatics equations, 


ee 
V-(€E)=0 


In the case of a conductor, o > 0, the free charges move, 
and we cannot prescribe them. In the steady state case, p is 
independent of time, and the continuity equation provides 
a closing equation for the Faraday law, 


VxE=0 l 
-V - (0E) =V. yim? 


Once the electric field is known, the resulting free charge 
density can be determined from the Gauss law. In view of 
the first equation, either set of the electrostatics equations is 
usually solved in terms of a scalar potential for the electric 
field. 

In the case of a perfect conductor, o = 00, the corre- 
sponding electric field E vanishes, and the volume occupied 
by the perfect conductor is eliminated from the (computa- 
tional) domain Q. 

The magnetostatics equations can be obtained by com- 
plementing the Ampere law with the Gaussian law for the 
magnetic flux. 


aes 
V-(uH) =0 


Once the electric field is known, the corresponding mag- 
netic field is obtained by solving the magnetostatics equa- 
tions. Because of the second equation, the problem is 
usually formulated in terms of a vector potential for the 
magnetic flux B = H. 


Wave equation 

In order to reduce the number of unknowns, the first-order 
Maxwell system is usually reduced to a single (vector- 
valued) ‘wave equation’, expressed either in terms of E 


or H. The choice is usually dictated by the boundary con- 
ditions, and the analysis of both systems is fully analogous. 
We shall focus on the electric field formulation, 


1 3E 3E aJ» 
y -V xE —— bim ao 
Š G . j+ Egé ot D 


Once the electric field has been determined, the Faraday 
equation can be integrated to find the corresponding mag- 
netic field. 


Time-harmonic wave equation 
Assuming the anzatz, 


E(x, t) = R(E (x)i) 2) 


we convert the wave equation into the ‘reduced wave 
equation’, 


vx (v x z) - (we — iwo)E = —ioJ™® (3) 


to be solved for the complex-valued phasor E(x). Alterna- 
tively, (3) can be obtained by applying Fourier transform to 
(1). The solution to the wave equation can then be obtained 
by applying the inverse Fourier transform, 


E(x, t) = xl e E(x, œ) dw 
-00 


Notice that the sign in the exponential in the inverse Fourier 
transform is consistent with that in the anzatz (2). In 
the Electrical Engineering (EE) literature, frequently, the 
opposite sign is assumed, 


E(x, t) = R(E(x)e/") (4) 


with j denoting the imaginary unit. The sign in the anzatz 
affects the sign in impedance and radiation boundary con- 
ditions, and one has to always remember which anzatz is 
being used. Substituting j = —i, we can easily switch in 
between the two formulations. 

Once the electric field has been determined, the cor- 
responding magnetic field is computed from the time- 
harmonic version of the Faraday law: 


Vx E = ~io H (5) 
For free space, € = ép © (1/36m)10~° [C?/Nm? = F/m], 


M = po = 4711077 [N/A? = h/m], o = 0, and (3) reduces 
to: 


2 A 
V x(V xE) - (2) E = iopo J 


Table ł. Material constants for selected materials. 


Material My € a/tho/€0o 


Alumina (10 GHz) 1 10 0.63 
Polyethylene (10 GHz) 1 2.25 0.19 
Copper 1 1 2.2E+9 
Seawater 1 1 1.5E+3 
Human muscle (900 MHz) 1 58 4.56E+2 


where c = (éy{ty)!/? = 3 x 108 ms“! is the speed of light 
in free space and kọ = w/c (1/m) is the free space wave 
number, Introducing relative permittivity, and relative 
permeability, 


we represent the general case in the form, 
vx (=v x z) — ( koe, — ika, | =20 | E 
By €o 
= —iko E yim (6) 
€o 


In reality, the material constants depend upon frequency 
«, temperature, and other factors. The discussed equations 
apply only to isotropic materials, many materials, for exam- 
ple, ferromagnetics, are strongly anisotropic. A few sample 
values of relative permeability p,, relative permittivity 
€» and scaled conductivity ./(19/€g)o are summarized in 
Table 1. 

` We shall now return to our original notation (3), with 
the understanding that, for practical computations, we use 
formulation (6). 


2 VARIATIONAL FORMULATION 


For the sake of simplicity, we shall restrict our presenta- 
tion to the case of bounded, simply connected domains 2 
only, We shall focus on the time-harmonic Maxwell equa- 
tions. We begin with the fundamental ‘integration by parts’ 
formula, 


[ODF f E0 x mars | (n x E)F,dS 
2 2 aa 


Here n is the outward normal unit vector for boundary dQ, 
F, = F —(F -n)n is the tangential component of vector 
F on the boundary, and 


nxE=nxE, 
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is the ‘rotated’ tangential component of E. Obviously, 
(n x E)F, = E,(n x F). Notice that there is no *-’ sign 
typical of other Green’s formulas. 

Finite energy considerations lead us naturally to the 
assumption that both electric field and magnetic field are 
square integrable. In view of (5), and under the assumption 
of boundedness of material data, 


O < Enin SES Ema < 0O, O < Min SU S Pax < 0 


0 <O < Omax < 00 


this implies that electric field E comes from the H (curl) 
space, 


H(curl, Q) = {E € L9): V x E € L’ (Q)) 


The Green formula is immediately telling us the right type 
of continuity across material interfaces and interelement 
boundaries for a conforming finite element (FE) discretiza- 
tion. Assuming that domain consists of two disjoint parts 
Q, i = 1,2 with an interface I, with aC 1 Geld E in either 
of the subdomains, we use the integration by parts formula 
to obtain, 


[ov x moar = f xe x pax + [tn x E19 dS 


for every C! test function @ vanishing on 0&2. Here [z x E] 
denotes the jump of the tangential component of E across 
the interface I’, Consequently, the field V x Æ is a function 
(regular distribution) if an only if the tangential component 
of E is continuous across the interface. With a square 
integrable impressed current J™?, similar considerations 
for the magnetic field lead to the observation that also the 
tangential component of H must be continuous across the 
material interfaces. In view of (5), this implies the second 
interface condition for the electric field, 


[ex vE] =0 
u 


Multiplying (3) with a (conjugated) test function F, and 
integrating by parts, we obtain, 


Í [aw x EVV x F) — (œe —ivo) EF} dx 
ale 


+f nx ly x E)F, dS = -io | J™ È dx 
an u Q 

(7) 
Working with conjugated test functions that result in 


sesquilinear rather than bilinear forms in the variational for- 
mulation, is typical for complex-valued wave propagation 
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problems, and consistent with the definition of inner prod- 
uct in a complex Hilbert space. It is only a matter of choice 
though and, as long as shape functions are real-valued, the 
bilinear and sesquilinear formulations yield identical sys- 
tems of discrete equations. 

We are now ready to discuss the most common boundary 
conditions. 


Perfect electric conductor (PEC) 

As the electric field in a perfect conductor vanishes, and 
the tangential component of E must be continuous across 
material interfaces, the tangential component of E on a 
boundary adjacent to a perfect conductor must vanish, 


nxE=0 


For scattering problems, the electric field is the sum of a 
known incident field E" and (to be determined) scattered 
field E*. The condition above then leads to a nonhomoge- 
neous Dirichlet condition for the scattered field, 


nx E = -n x E 


Impressed surface current 
The simplest way to model an antenna is to prescribe an 
impressed surface current on the boundary of the domain 
occupied by the antenna, 


nx H =J 


In conjunction with (5), this leads to the Neumann (natural 
boundary) condition, 


1 m 
nx nY x E) = -ioJ s” 


For a particular case of JẸ? = 0, we speak of a magnetic 
symmetry wall condition . 


Impedance boundary condition 

This is the simplest, first-order approximation to model 
reflection of the electric field from an interface with a 
conductor with large but nevertheless finite conductivity, 
comp. Senior and Volakis (1995), 


imp 


1 ‘ : 
n x —(V x E)—iwyE, = —ioJs 
u 


with a constant y > 0. One arrives naturally at the 
impedance boundary condition also when modeling 
waveguides. 

Denoting by T}, lT}, I3, three disjoint components 
of boundary 8Q, on which the Dirichlet, Neumann, and 


impedance boundary conditions have been prescribed, we 
limit the test functions in (7) to those that satisfy the 
homogeneous PEC condition on |, and use Neumann and 
Cauchy conditions to build the impressed surface currents 
into the formulation. With appropriate regularity assump- 
tions on the impressed currents, for example, both J= 
and J,” being square integrable, our final variational for- 
mulation reads as follows. 


Ee H(cul,Q),nx E =n x Eg onl, 
Í [o x E)(V x F) — (we - ivo) EF dx 
Q 


+io f yE,F as = -io | J? F dx 
D Q 


+ io f JE? Fas for every F e H(curl, Q), 
rur; 


ux F=0onl, 
(8) 
We follow the classical lines to show that, conversely, any 
sufficiently regular solution to the variational problem sat- 
isfies the reduced wave equation and the natural boundary 
conditions. 


Weak form of the continuity equation 

Employing a special test function, F = Vq, q € H'(Q), 
q =0onT), we learn that the solution to the variational 
problem automatically satisfies the weak form of the con- 
tinuity equation, 


f —(w*e — iwo) EVG dx + iw { yE,Vq dS 
2 T3 


SE [ YG dx + io [ F™ vgs 
2 TzU 
for every q € H'(Q),q = 0 on D (9) 


Upon integrating by parts, we learn that solution Æ satisfies 
the continuity equation, 
div ((w*e — iwo) E) =iw div J™ (= wp) 


plus additional boundary conditions on l`}, l3, and interface 
conditions across material interfaces. 


Maxwell eigenvalue problem 


Related to the time-harmonic problem (8) is the eigenvalue 


problem, 
Ee H(cul,Q),n2xE=Oonl,, AER 
f tix EV x Ë) dx = xf cEF dx (10) 
ab 2Q 
for every F € H(curl, Q2),n x F =OonT, 


The curl—curl operator is self-adjoint. Its spectrum con- 
sists of à = 0 with an infinite-dimensional eigenspace con- 
sisting of all gradients Vp, p € H! (Q), p = 0 on F}, and 
a sequence of positive eigenvalues y < g <--- <2, > 
co with corresponding eigenspaces of finite dimension. 
Only the eigenvectors corresponding to positive eigenvalues 
are physical. Repeating the reasoning with the substitution 
F = Vq, we learn that they automatically satisfy the con- 
tinuity equation. 


Stabilized variational formulation 

The standard variational formulation (8) is not uniformly 
stable with respect to frequency œ. As œ — 0, we loose 
control over gradients. This corresponds to the fact that, 
in the limiting case œ = 0, the problem is ill-posed as the 
gradient compouent remains undetermined. A remedy to 
this problem is to enforce the continuity equation explicitly 
at the expense of introducing a Lagrange multiplier p. 
The so-called stabilized variational formulation looks as 
follows. 


E € H(curl,Q), p € H! (Q), nx E=nxE, 
p=0 00 


[a0 xm x Par- f (ore ive - Fax 
ae Q 
+io f YE, Pas — | (ae —iwo)¥ p- Fax 
Ts Q 


= -io f JO. F dx +io f Je”. Fds 
2 T,UP3 
VF € Huri, 2),n x F =Oonl, 
- f (or —iwo)E.Vgdx +io f yE,Vgds = 
2 Ta 


-io | ym. Vg ax +io f IS? -Vgds 
Q2 T2 


vq € H'(Q),q=OonTl, 

a1) 
By repeating the reasoning with the substitution F = Vq 
in the first equation, we learn that the Lagrange multi- 
plier p satisfies the weak form of a Laplace-like equa- 
tion with homogeneous boundary conditions and, there- 
fore, it identically vanishes. For this reason, it is fre- 
quently called the hidden variable. The stabilized formu- 
lation has improved stability properties for small «. In the 
case of o = 0 and right-hand side of (9) vanishing, we 
can rescale the Lagrange multiplier, p = *p,q = w*q, 
to obtain a symmetric mixed variational formulation with 
stability constant converging to one as w — 0. In the 
general case, we cannot avoid a degeneration as w -> 0 
but we can still rescale the Lagrange multiplier with œ 
(p = op, g = wg), to improve the stability of the formu- 
lation for small w. The stabilized formulation is possi- 
ble because gradients of the scalar-valued potentials from 
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H'(Q) form precisely the null space of the curi-curi 
operator. 

The point about the stabilized (mixed) formulation is 
that, whether we use il or not in the actual computa- 
tions (the improved stability is one good reason to do 
it...), the original variational problem is equivalent to 
the mixed problem. This suggests that we cannot escape 
from the theory of mixed formulations when analyzing the 
problem. 


3 EXACT SEQUENCES 


The gradient and curl operators, along with the divergence 
operator, form an exact sequence, 


R > H! Y, Acurl) YX A(div) Y L? 30 


In an exact sequence of operators, the range of each 
operator coincides with the null space of the operator next 
in the sequence. Simply speaking, the gradient of a function 
vanishes if and only if the function is constant, the curl of a 
vector-valued function is zero, if and only if the function is 
the gradient of a scalar potential, and so on. The spaces 
above may incorporate homogeneous essential boundary 
conditions. Introducing, 


W = {q € H'(Q):¢ =0 on T,} 

Q = [E € Hurl, Q), E, = 0 on Tj} 
V = {H € Hdiv, Q), H, = 0 oT} 
Y = LQ) 


(H, = Hon denotes normal component of H) we have the 
exact sequence: 


NE: Moy ey —— 0) 


The presence of R in the original sequence signifies the fact 
that the null space of the gradient consists of constant fields. 
With the Dirichlet boundary condition on boundary T}, the 
constant must be zero, and the space of constant fields is 
eliminated. Similarly, presence of the trivial space at the 
end of the sequence signifies the fact that the preceding 
Operator is a surjection. If l} coincides with the whole 
boundary, the L? space must be replaced with the space of 
fields of zero average, 


L = fu er: f u =o} 
2 
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In order to simplify the notation, we shall drop R and the 
trivial space from our discussion. 

In two-space dimensions, the 3D exact sequence gives 
rise to two sequences, 


R— H!_Y, Huh) M 


and 
R—> H! *, iv) VS% 2? 


Notice the difference between the two curl operators; both 
are obtained by restricting the 3D curl operator, in the first 
case, to vectors (E,, E,,0), in the second to (0,0, E3), 
E; = E;(,, x2). 


curl(E,, E,)=E,;,—E,,;, V x Ey = (E32, -E3)) 


The second 2D sequence can be obtained from the first one 
by ‘rotating’ operators and spaces by 90 degrees. 

The exact sequence property is crucial in proving the sta- 
bility result for the regularized variational formulation for 
the time-harmonic Maxwell equations; see Demkowicz and 
Vardapetyan (1998). This suggests constructing (piecewise) 
polynomial Finite Element discretization of the H (curl) 
space in such a way that the exact sequence property is also 
Satisfied at the discrete level. Two fundamental families of 
Nedelec’s elements satisfy such a condition. 


Tetrahedral element of second type (Nedelec, 1986) 
All polynomial spaces are defined on the master 
tetrahedron, 
T = {(X1, x3, x3): x, > 0, x3 > 0, x3 > 0, 
x, +X +43 < 1} 


We have the following exact sequence, 


pp v peo MX pP-2 Vo pP-3 


Here PP denotes the space of polynomials of (group) order 
less than or equal to p, for example, x?x3x? €e P’, and 
PP = PP x PP x PP. 

Obviously, the construction starts with p > 3, that is, the 
H (curl)-conforming elements are at least of second order. 

The construction can be generalized to tetrahedra of vari- 
able order. With each tetrahedron’s face we associate the 
corresponding face order pp, and with each tetrahedron’s 
edge, we associate the corresponding edge order p,. We 
assume that, 


Ps £p Y face f, p, < pp Y face f adjacent to edge e, 
V edge e 


The assumption is satisfied in practice by enforcing the 
minimum rule, that is, setting the face and edge orders to 
the minimum of the orders of the adjacent elements. We 
introduce now the following polynomial spaces. 


e The space of scalar-valued polynomials of order less 
than or equal to p, whose traces on faces f reduce to 
polynomials of (possibly smaller) order p+, and whose 
traces on edges e reduce to polynomials of (possibly 
smaller) order p,, 


pe 


Pr pe = (u € PPiuly © P” (F), ule E PP (e)} 


e The space of vector-valued polynomials of order less 
than or equal to p, whose tangential traces on faces 
f reduce to polynomials of order py, and whose 
tangential traces on edges e reduce to polynomials 
of order p,, 


Porp = {E e PP: E,]; € P” (F), Ele € P” (e)} 


e The space of vector-valued polynomials of order less 
or equal to p, whose normal traces on faces f reduce 
to polynomials of order Py 


Po ={E € PPE, |, € P” (f)} 
We have then the exact sequence, 


P v p~l vx p-2 Vo 3 
Poppe — Pope- —> Pp- —> pr 


with a 2D equivalent for the triangular element, 


BLATS eS o 
The case Py: Pe = —1 corresponds to the homogeneous 


Dirichlet boundary condition. 


Hexahedral element of the first type (Nedelec, 1980) 
All polynomial spaces are defined on a unit cube. We 
introduce the following polynomial spaces. 


= 04r) 
W, =Q 
Q, A QPAD x QP IHW) x gPar-1) 
pe geard x QP-har-D x gP-14-1r) 
= OU -1r-1) 
Y, =Q 
Here Q?-7-" denotes the space of polynomials of order 
less than or equal to p,q,r with respect to x, y,z respec- 


tively. For instance, 2xy? + 3x3z8 e Q@:3.8), The polyno- 
mial spaces form again the exact sequence 


Vv Vv Vo 
We Oa, a, (13) 


The generalization to variable order elements is a little less 
straightforward than for the tetrahedra. We shall begin with 
the 2D case first. For each horizontal edge e, we introduce 
order p,, and with each vertical edge e, we associate order 
qe. We assume again that the minimum rule holds, that is, 


PeZP, %=4 


By QL? we understand the space of polynomials of order 
less than or equal to p with respect to x, and order less 
than or equal to q with respect to y, such that their traces 
to horizontal edges e reduce to polynomials of (possibly 
smaller) degree p,, and restrictions to vertical edges reduce 
to polynomials of (possibly smaller) order q,, 


QVP = [u e QP :u(-, 0) € PO, 1), ul, 1) € PP (0, 1) 
u(0,-) € P? (0, 1), u(1, -} € PP ©, 1)} 


With spaces, 
= 02D 
W, = Qerq 
(p—1,9) (p,¢—1) 
Q, z On ? x on 
= Ole-ha-) 
Y,=@2 


we have the exact sequence, 


v v 
EN BN 


Notice that space Q, cannot be obtained by merely differ- 
entiating polynomials from ov? . For the derivative in x, 
this would lead to space ow ae, for the first component, 
whereas in our definition above, q, has been increased to 
q. This is motivated by the fact that the traces of E, along 
the vertical edges are interpreted as normal components 
of the E field. The A(curl)-conforming fields ‘connect’ 
only through tangential components; shape functions cor- 
responding to the normal components on the boundary are 
classified as interior modes, and they should depend only 
upon the order of the element and not upon the order of 
the neighboring elements. 

In three dimensions, spaces get more complicated and 
notation more cumbersome. We start with the space, 


gran 
(PEA) (Pr TECATE) Perdeste 


that consists of polynomials in Q4”) such that: 


e their restrictions to faces f parallel to axes x, y reduce 
to polynomials in QPr4, 

ə their restrictions to faces f parallel to axes x, z reduce 
to polynomials in QPP, 
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e their restrictions to faces f parallel to axes y, z reduce 
to polynomials in QP, 

e their restriction to edges parallel to axis x, y, z reduce 
to polynomials of order p,, 4e, r, respectively, 


with the minimum rule restrictions: 


Pp Pidg ET ST, Pe S Pp de Safle STF 


for adjacent faces f 


The 3D polynomial spaces forming the de Rham diagram 
are now introduced as follows: 
= Oar) 
W; Rod O aO CRN ee oe 
— QP-1ar) 
Q, ji Qipa Pr- Lr) pelar 
(p.4—1,r) 
X Qipa- Dlar-Lrphprde-Lry 


(p,4¢,7~1) 
ms Opre- Darr -Dapa re- 


p — Qlpa-lr-) (p~1,4,r-1) (p—1,4~1,r) 
V? = Dain) X Qin * Qip-harD 


= p(p-lg-lr-1) 
Y, =Q? 


Note the following points: 


e There is no restriction on edge order in the H (div) - 
conforming space. The only order restriction is placed 
on faces normal to the particular component, for exam- 
ple, for the first component Hj, the order restriction is 
imposed only on faces parallel to y, z faces. 

e For the A(curl)-conforming space, there is no restric- 
tion on face order for faces perpendicular to the par- 
ticular component. For instance, for Æ}, there is no 
order restriction on faces parallel to y,z axes. The 
edge orders for edges perpendicular to x are inherited 
from faces parallel to the x axis. This is related to the 
fact that elements connecting through the first compo- 
nent E}, connect only through faces and edges parallel 
to the first axis. 


Tetrahedral element of first type (Nedelec, 1980) There 
is a significant difference between the tetrahedral and hex- 
ahedral elements presented so far. For the tetrahedron, the 
order p drops in the diagram from p to p — 3. This merely 
reflects the fact that the differentiation always lowers the 
polynomial order by one. In the case of the Q-spaces, how- 
ever, the order in the diagram has dropped only by one, 
from (p, q,r) to(p—1,q —1,r — 1). A similar effect can 
be obtained for the tetrahedra. 

We shall first discuss the 2D case of a triangular element. 
The goal is to switch from p — 2 to p — 1 in (12) without 
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increasing the order p in the space on the left. We begin 
by rewriting (12) with p increased by one. 


v vx Si 
Ph — Phi > P 


Notice that we have not increased the order along the 
edges. This is motivated by the fact that the edge orders 
do not affect the very last space in the diagram. Next, we 
decompose the space of potentials into the previous space 
of polynomials P$, and an algebraic complement prL 


PR = Ph O Pe 


The algebraic complement is not unique, it may be con- 
structed in (infinitely) many different ways. The decompo- 
sition in the space of potentials implies a corresponding 
decomposition in the H(curl)-conforming space, 


Phy = Pp © VËR © Pes 


The algebraic complement Ps is again not unique. The 
desired extension of the original sequence can now be 
constructed by removing the gradients of order p + 1, 


v p-1 HP VX appl 
Pe, er Pl © Ppi — P 


Note the following facts. 


e The construction enables the H (curl)-conforming dis- 
cretization of lowest order on triangles. For p = p, 
=1, 


Poo P= Pl, dimPl=3 


The complement P is empty and, therefore, in this 
case, the resulting Whimey space P} = Po ® BS is 
unique. This is the smallest space to enforce the 
continuity of the (constant) tangential component of 
E across the interelement boundaries. 

e It is not necessary but natural to construct the comple- 
ments using scalar and vector bubble functions. In this 
case, the notation ?” re and PP is more appropriate. 
The concept is especially natural if one uses hierarchi- 
cal shape functions. We can always enforce the zero 
trace condition by augmenting original shape func- 
tions with functions of lower order. In other words, we 
change the complement but do not alter the ultimate 
polynomial space. 

e The choice of the complements may be made unique 
by imposing additional conditions. The original Ned- 
elec’s construction, for elements of uniform order p, 


employs symmetric polynomials, 
R? ={E e P”: (E) =0} 


where €? is the Nedelec symmetrization operator, 


CPE) inn = me (a 
not p+ \ Ox; e 9%, OX: 
a.) frst a ri) 
CEJN Oxi ax;, ax; ax, ax, 


The algebraic complement can then be selected as 
the subspace of homogeneous symmetric polynomials 
D? [i], 


R? = P? @ D? 


The space D? can be nicely characterized as the image 
of homogeneous polynomials of order p — 1 under the 
Poincare map; see Hiptmair (1999, 2000), 


1 
E,@)= -f ty (tx) dt 


1 
E,(x) =x f ti(tx) dt (14) 


The Poincare map is a right inverse of the curl map, 
V x E =, for the E defined above. Consistent with 
our discussion, it can be shown that the tangential trace 
of a symmetric polynomial of order p is always a 
polynomial of order less than or equal to p — 1. 

e The uniqueness of the spaces could also be natu- 
rally enforced by requesting orthogonality of algebraic 
complements, 


P? = P32 © VR) @ PL, 


e 


PE, = PET BVP) 8 Pi 


perl =p ePi, Pt = pe, @ pet 


The orthogonality in the E space is understood in the 
H (curl) sense, with the corresponding L? orthogonal- 
ity for the gradients. 


The 3D construction goes along the same lines but it 
becomes more technical. The following decompositions are 
relevant. 


PH = PP, Opt! 


Pespp+l Pes Pj =l, prt! 

P — pp-l Ppt pP 
Polo T Poel pr-1 ® VOL y+) ® Paip 
p — pp-l pP+l sP 

Po = Epa ® V(P Li np ei) @ PL 


The ultimate sequence looks as follows: 


v 1 5? vx —1 
P P P 
Preps BRA P pe-l,pr-1 © Pip, = Ppi 


@ PP, Z p 


Referring to Webb (1999) and Demkowicz (2000) for 
details, we emphasize only that switching to the tetrahedra 
of the first type in 3D, requires adding not only extra interior 
bubbles but face bubbles as well. 


Prismatic elements 

We shall ‘not discuss here the construction of the exact 
sequences for the prismatic elements. The prismatic ele- 
ment shape functions are constructed as tensor products of 
triangular element and 1D element shape functions. We can 
use both Nedelec’s triangles for the construction and, con- 


sequently, we can also produce two corresponding exact 
sequences. 


Parametric elements 

Given a bijective map x = x g (Ẹ) transforming master ele- 
ment K onto a physical element K, and master element 
shape functions $(€), we define the H!-conforming shape 
functions on the physical element in terms of master ele- 
ment coordinates, 


(x) = $E) = bez) = Gorge) 


The definition reflects the fact that the integration of master 
element matrices is always done in terms of master element 
coordinates and, therefore, it is simply convenient to define 
the shape functions in terms of master coordinates §. This 
implies that the parametric element shape functions are 
compositions of the inverse xz! and the master element 
polynomial shape functions. In general, we do not deal 
with polynomials anymore. In order to keep the exact 
sequence property, we have to define the H (curl)-, H(div)-, 
and L?-conforming elements consistently with the way the 
differential operators transform. For gradients, we have 


ðu dit 98, 
ax, 8, Ox; 
and, therefore, 
a ôE 
Fe a 


For the curl operator, we have 


dE, a (é =) 
Sie De = Siege | Erg 
ax; YE ax; ax, 
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aÊ, ae, a a BÊ, 3, 98; 


= ép T E, q1 = €.,, ——— 
UR ax, ax, | kaaa Uag n Ox; Ox, 
—— 
=0 
But, 
8m Br tg 2i 
ax, ax rl Og 


where J is the inverse jacobian. Consequently, 


aE, _1 x; ak; 
Er = I | enm ae 
jk ax ae, nige. 


This leads to the definition of the H (div)-conforming 
parametric element, 


H= Ê 
i dEn n 
Finally, 
R o L (m)ga riina th 
ax; ax akr Er 08, Ox; ak, 


which establishes the transformation rule for the 
L?-conforming elements, 


far f 


Defining the parametric element spaces Wp Qp. Vp, and 
Y, using the transformation rules listed above, we preserve 
for the parametric element the exact sequence (13). 

In the case of the isoparametric element, the components 
of the transformation map x y come from the space of the 
H'-conforming master element, 


ap Dah = Dorit) 


Here Xjk denote the (vector-valued) geometry degrees of 
freedom corresponding to element shape functions $;(x). 
By construction, therefore, the parametric element shape 
functions can reproduce any linear function a,x;. As they 
can also reproduce constants, the isoparametric element 
space of shape functions contains the space of all linear 
polynomials in x, i.e. ajx, + b, in mechanical terms — the 
space of linearized rigid body motions, The exact sequence 
property implies that the H (curl)-conforming element can 
reproduce only constant fields, but the H (div)-conforming 
element, in general, cannot reproduce even constants. This 
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indicates in particular that, in context of general paramet- 
tic (nonaffine) elements, [2] unstructured mesh generators 
should be used with caution (comp. Arnold, Boffi and Falk, 
2000). This critique does not apply to (algebraic) mesh gen- 
erators based on a consistent representation of the domain 
as a manifold, with underlying global maps parameteriz- 
ing portions of the domain. Upon a change of variables, 
the original problem can then be redefined in the reference 
domain discretized with affine elements; see, for example, 
Xue and Demkowicz (2002). 


4 PROJECTION-BASED 
INTERPOLATION. DE RHAM 
DIAGRAM 


In the h version of the FEM, once an element space of shape 
functions X(K) of dimension N has been specified, the 
classical notion of a finite element (see Ciarlet, 1978) still 
requires defining the element degrees of freedom. Element 
d.o.f. are linear functionals, defined on a larger [3] space 
MK), 


PAK) OR, fl. N 


such that their restrictions on the element space of shape 
functions are linearly independent. The element shape func- 
tions »; € X(K) are then defined as a dual basis to the d.o.f. 
functionals, 


Ylh) i 8:55 i, J = 1, mami 
The purpose of introducing the d.o.f. is twofold: 


e to enforce the global conformity by equating appro- 
priate d.o.f. for adjacent elements, 
e to define the interpolation operator, 


N 
Tl: 4(K) > X(K), u= Do yug 


isl 


The interpolation operator is then used not only in proofs 
of convergence but also in practical computations: mesh 
generation, approximation of initial and Dirichlet boundary 
conditions data, and so on. 

For the p and hp methods, it is more natural to intro- 
duce first shape functions and define interpolation operators 
directly, without reference to any d.o.f. 


Shape functions 

With infinite precision, the FE solution depends only on 
the spaces and not on concrete shape functions. The choice 
of shape functions, however, affects the conditioning of 


stiffness matrices and, in presence of round-off error, the 
ultimate quality of the FE solution. In context of the 
p- and hp-versions of the FEM, it is natural to request 
for the shape functions to be hierarchical; increasing order 
p Should result in addition of extra shape functions without 
modifying the existing ones. Although the choice of shape 
functions is up to a certain point arbitrary, they have to sat- 
isfy basic logic requirements resulting from the conformity 
considerations, For example, for a tetrahedral element, we 
must satisfy the following conditions. 


e H'.conforming elements. We have vertex, edge, face, 
and interior (‘bubble’) shape functions. A vertex shape 
function vanishes at the remaining vertices. An edge 
shape function vanishes along the remaining edges 
(and, therefore, at all vertices as well), A face shape 
function vanishes over the remaining faces. The inte- 
rior bubbles vanish over the whole element boundary. 
The number of shape functions associated with a par- 
ticular topological entity is equal to the dimension of 
the corresponding trace space, for example, we have 
one shape function per vertex, p, — 1 shape functions 
per edge of order pe, (pz — 2)(py — 1)/2 shape func- 
tions per face of order Pp and (p — 3)(p —2)(p — 
1)/6 interior bubbles. 

e H(curl)-conforming elements. We have edge, face 
and interior (‘vector bubble’) shape functions. Tan- 
gential traces of an edge shape function vanish along 
the remaining edges. Tangential traces of a face bub- 
ble vanish over the remaining faces and tangential 
traces of interior bubbles vanish over the whole bound- 
ary of the element. The number of shape functions 
is again equal to the dimension of the corresponding 
trace space and it depends upon the element kind. 

e H(div)-conforming elements. We have only face and 
interior bubble shape functions. Normal traces of a 
face shape function must vanish over the remaining 
faces. Normal traces of interior bubbles over the whole 
element boundary are zero. 


Additional restrictions result from the assumption on 
functions being hierarchical and, for tetrahedral elements, 
the rotational invariance assumption. The hierarchical 
assumption implies that vertex shape functions must be lin- 
ear. An edge shape function that is of order p, along the 
edge should be extended to the rest of the element using a 
polynomial of order p,, and so on. The rotational invariance 
assumption implies that H!-conforming shape functions 
should be defined in terms of barycentric coordinates Aj, 
whereas the H(curl)-conforming shape functions should 
be defined in terms of products 4,V,;. Notice that the 
invariance of H(curl)-conforming shape functions must be 


understood consistently with the definition of the H (curl)- 
conforming parametric element. If B:R? —> R? denotes the 
affine map that maps vertex e into vertex e + 1 (modulo 
4), and b, denotes a shape function corresponding to edge 
e, we request that 


bey) = $B x) V B(x) 


The idea of H}-conforming shape functions was devel- 
oped in the pioneering work on the p-method by Szabo and 
his students: see Szabo and Babuska (1991) and the litera- 
ture therein; compare also the Chapter 5, this Volume by 
Diister, Szabo, and Rank in this volume. 

Shape functions implied by Nedelec’s degrees of free- 
dom are neither unique nor hierarchical. The higher-order 
H(curl)-conforming shape functions were rediscovered 
in the engineering community a decade later, starting 
with Lee, Sun and Cendes (1991). Hierarchical H (curl)- 
conforming shape functions were first introduced in Webb 
and Forghani (1993) and further developed in Wang and 
Webb (1997) and Webb (1999). The last contribution con- 
tains a detailed discussion on rotational invariance, the 
possibility of separating the shape functions into gradients 
of scalar shape functions and (‘rotational’) shape functions 
with nonzero curl, as well as the possibility of enforcing 
partial L?-orthogonality in between the shape functions. In 
Webb (1999), the author points out clearly the fact that 
the Nedelec’s construction of incomplete (‘mixed’) order 
tetrahedra is not unique; see also the discussion above and 
Demkowicz (2000). For a recent work on optimal selection 
of hierarchical shape functions; see Ainsworth and Coyle 
(2001, 2003, 2003). The construction of optimal nonhier- 
archical H (curl)-conforming shape functions of arbitrary 
order remains a subject of intensive research as well; see for 
example Graglia, Wilton and Peterson (1997) and Salazar- 
Palma et al. (1998) and the literature therein. 


Projection-based interpolation 
The idea of projection-based interpolation stems from three 
assumptions. 


e Locality. The element interpolant of a function should 
be defined entirely in terms of the restriction of the 
function to the element only. 

ə Global continuity. The union of element interpolants 
should be globally conforming. 

e Optimality. The interpolation error should behave 
asymptotically, both in h and p, in the same way as 
the actual approximation error. 


We make the following regularity assumptions. 


ue W:= H+, r>0 
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E € Q:= H"? (curl) 
= [E e HY: V x E e H+}, r>0 
H e V:= H' (div) = {H e H':V -He H}, r>0 


with the corresponding norms denoted by JullijzrT> 
lE lleurt,1/247,7> lF llawr- Here H” denotes the fractional 
Sobolev spaces; see for example Adams (1978). 

Let us start with the H}-conforming interpolation first. 
Locality and global continuity imply that the interpolant 
uP = Tlu must match interpolated function u at vertices: 

u?(v) =u(v) for each vertex v 
With the vertex values fixed, locality and global continuity 
imply that the restriction of the interpolant to an edge should 
be calculated using the restriction of function u to that 
edge only. Optimality, in turn, implies that we should use 
a projection in some ‘edge norm’, 
lu—u? ll, > min, for each edge e 

By the same argument, we should then use face projec- 
tions to determine the face interpolants and, finally, project 
over the element to complete the definition. The choice of 
element norm is dictated by the problem being solved: the 
H'-norm for elliptic, the H(curl)-norm for the Maxwell 
problems, and so on. It follows from the optimality con- 
dition then that the face and edge norms are implied by 
the Trace Theorem; see Lions and Magenes (1972) and 
Buffa and Ciarlet (2001). The H!-conforming interpolant 
Tlu € X(K) of function u e H1* (T), r > 0, is formally 
defined as follows. 


u? (v) = u (v) for each vertex v 
lu? — ulo > min for each edge e 
ju? ~ul p> min foreach face f 
lu? — ulr > min for element K 


H(curl)-conformity involves only the continuity of the tan- 
gential component, Consequently, there is no interpolation 
at vertices, and the interpolation process starts from edges. 
Given a function E € HY?” (curl), r > 0, we define the 
projection-based interpolant TI" E := EP by requesting 
the conditions, 


JEF — E,|_1,¢ > min, for each edge e 

IV x E?) ong = (V XE) nplaizs > min 

(EF —E,, Vpb)_1.,7=0 for every face bubble p, 
for each face f 

IY x EP — V x Ely > min 

(EP — E, V) g =0, for every element bubble p 
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Here the norms and inner products correspond to spaces 
H- (e) and HP) = (Hoi (f))'- 

For completeness, we also record the projection-based 
interpolation for H(div)-conforming elements. Given a 
vector-valued function H € H’ (div),r > 0, we define the 
projection-based interpolant TI“"H := H? e P? by 
requesting the conditions, 


JH? -ny—H-ngll_i,¢ > min, for each face f 
IVoH”’ — VoH ilox > min 
(H? — H, V x $)o.x =0 for every 

element bubble p 


Notice that the bubble shape functions are defined con- 
sistently with the de Rham diagram. The H!-bubbles are 
used to define the H (curl)-conforming interpolant and the 
H(curl)-bubbles are used to define the H (div)-conforming 
interpolant. 

The interpolation is to be performed on the master ele- 
ment. In order to interpolate on the physical element, we 
use the element map to transfer the interpolated function 
to the master element first, interpolate on the master ele- 
ment, and then transfer back the interpolant to the physical 
element [4]. 

Under some technical conjectures concerning the exis- 
tence of polynomial extensions, we have the following 
result for the tetrahedral element T of the second kind; 
see Demkowicz and Babuška (2003) and Demkowicz and 
Buffa (2004) [5]. 


Theorem 1. For each € > 0, there exist constants C(e, r) 
dependent upon ¢€ and regularity r but independent of poly- 
nomial order p such that the following estimates hold. 


pN W24rne 
lu- Mal SC (=) Hellion 
1/24r—~e 
WE — I Elleutor <C (=) IE leurt, 17247,7 


i hN 
|F — 1” Flavor sC (=) lE har 


Here h denotes the element size and p is the maximum 
order of polynomials reproduced by the element space of 
shape functions. The elements may be of variable order. 
A similar result is expected to hold for the other elements 
discussed in the article. 

With the projection-based interpolation operators in 
place, we have the following fundamental result. 


Theorem 2. The following de Rham diagram commutes. 


Ye “8 ee 


Here Wi, Qrp» Vap» and Fp, denote the spaces of ele- 
ment shape functions corresponding to any of the discussed 
parametric elements. Notice that, contrary to the classical 
approach, the interpolation procedure does not depend upon 
the element being considered. 

The commuting diagram property is crucial in proving a 
number of fundamental mathematical results, starting with 
a proof of the discrete compactness property introduced 
in Kikuchi (1989). The property mimics a corresponding 
compactness property on the continuous level. Given a 
sequence of discrete fields E,,, that are uniformly bounded 
in H(curl) and discrete divergence-free, that is, 


WE jplleurto2 < C, (Erp: Von,p41)0,2 = 9 


for every Èa p41 


we can extract a subsequence that converges strongly 
in Z? to a limit E. The property has been proved for 
h-extensions in Boffi (2000), Demkowicz, Monk and 
Schwab (2000), and Monk and Demkowicz (2000) and 
recently (under an additional conjecture on polynomials, 
2D only) for p and hp-extensions in Boffi, Demkowicz and 
Costabel (2003). The discrete compactness property implies 
convergence of Maxwell eigenvalues with optimal rates 
and, in turn, asymptotic stability and optimal convergence 
for the time-harmonic Maxwell equations; see Demkowicz 
and Vardapetyan (1998), Monk and Demkowicz (2000), and 
Boffi (2001). 

The importance of the commuting diagram property in 
context of the original Nedelec’s interpolation was first 
emphasized in Bossavit (1989). Nedelec’s interpolation was 
intended for h-extensions only and it does not make the 
diagram commute for variable order elements. It also does 
not yield an interpolant optimal in p. The projection-based 
interpolation for variable order elements with the corre- 
sponding proof of the commuting property was introduced 
in Demkowicz et al. (2000). For a more general defini- 
tion of commuting quasi-local interpolation operators, see 
Schéberl (2001). 


5 ADDITIONAL COMMENTS 


Exterior boundary value problems 
Electromagnetic waves do not penetrate only ideal conduc- 
tors, and the solution to most practical problems 


involves modeling of waves in unbounded domains. The 
formulation of a boundary value problem then includes also 
the Silver—Miiller condition expressing the fact that waves 
can propagate only toward infinity (a generalization of Som- 
merfeld radiation condition for the Helmholtz equations). 
Finite elements cannot be used to discretize such a prob- 
lem directly. The most natural approach is then to truncate 
the infinite domain with a truncating surface, and comple- 
ment the FE discretization within the truncated domain with 
a special Absorbing Boundary Condition (ABC). The most 
popular ABC in context of EM waves is the Berenger’s 
Perfectly Matched Layer (PML) approach; see Bérenger 
(1994). Finite elements can also be naturally coupled with 
Infinite Elements; see Cecot, Demkowicz and Rachowicz 
(2003). The most mathematically sound approach is to cou- 
ple the variational formulation within the truncated domain 
with a weak form of a boundary integral equation (BIE) 
set up on the truncating surface. A consistent discretization 
of the BIE requires the use of RWG (Rao-Wilton-Glisson) 
elements; see Rao, Wilton and Glisson (1982). 

Waveguides form an important class of problems defined 
in unbounded domains. The solution of a waveguide prob- 
lem leads to a two-dimensional eigenvalue problem defined 
in either a bounded or an unbounded domain. Waveguide 
problems naturally lead also to mixed formulation and the 
de Rham diagram; see Vardapetyan and Demkowicz (2002) 
and Vardapetyan, Demkowicz and Neikirk (2003). 


Iterative and multigrid solvers 

Solutions of significant problems requires the use of iter- 
ative and multigrid solvers. The construction and analysis 
of such solvers for Maxwell equations differs significantly 
from that for elliptic problems, and it must be based again 
on Helmholtz decomposition and de Rham diagram; see, 
for example, Hiptmair (1998), Arnold, Falk and Winther 
(2000), and Gopalakrishnan and Pasciak (2003). 


A posteriori error estimation and adaptivity 

A truly effective implementation of an FE code must inte- 
grate a posteriori error estimation, adaptivity, and multigrid 
solvers. For examples of such implementations in context 
of low order elements; see Beck et al. (1999) and Haase, 
Kuhn and Langer (2001), compare also Salazar-Palma et al. 
(1998). 

As I mentioned in the abstract, this presentation is biased 
very much towards kp elements that enable exponential 
convergence and have constituted my own research for 
the past decade. Mathematical foundations for analyzing 
hp convergence for Maxwell equations start with the fun- 
damental contributions of Costabel and Dauge (2000) on 
regularity of solutions to Maxwell equations. In his fun- 
damental result, Costabel (1990), demonstrated the failure 


Finite Element Methods for Maxwell Equations 735 


of standard penalty-based H!-conforming elements to con- 
verge in the case of problems with singular solutions that 
are not in H!. In their most recent work, though, Costa- 
bel and Dauge (2001) modified the penalty term, using 
a weighted regularization, and proved and demonstrated 
the possibility of exponential convergence of the modi- 
fied method. The first person to analyze the p-extensions 
using Nedelec’s elements was Monk (1994); see also Wang, 
Monk and Szabo (1996). To my best knowledge, the first 
hp codes, both in 2D (hybrid meshes consisting of both 
quadrilaterals and triangles) and in 3D (hexahedra) were 
put together in Rachowicz and Demkowicz (2000); see 
also Zdunek and Rachowicz (2002). The codes enabled 
both k and p adaptivity, with the possibility of anisotropic 
refinements by means of the constrained approximation 
for one-irregular meshes. Subsequent implementations in 
2D were presented in Ainsworth and Coyle (2001) and in 
Ledger et al. (2003). Experimentally obtained exponential 
convergence rates were reported in Rachowicz, Demkow- 
icz and Vardapetyan (1999) and in Ainsworth and Coyle 
(2001). An automatic hp-adaptivity for Maxwell equations 
based on the minimization of the projection-based inter- 
polation error has recently been presented in Demkowicz 
(2003). 


Transient problems and discontinuous Galerkin FE 
discretizations 

A separate development is taking place in the field of Dis- 
continuous Galerkin (DG) methods for Maxwell equations; 
see, for example, Perugia, Schétzau and Monk (2002). The 
DG approach is especially well suited for the solution of 
transient Maxwell equations; see Hesthaven and Warburton 
(2001). 


6 RELATED CHAPTERS 

(See also Chapter 5, Chapter 9, Chapter 13 of this Vol- 
ume; Chapter 22 of Volume 2) 
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NOTES 


[1] A polynomial of order p is homogeneous if it can 
be represented as a sum of monomials of order p. 
Equivalently, u(&x,,..., &x,) = §?u(x,,...,X,). 
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[2] Note that general quadrilaterals or hexahedra with 
straight edges are not affine elements. 

[3] The space should include the solution to the boundary 
value problem. 

[4] Alternatively, one could perform the projection-based 

interpolation directly on the physical element. The two 

interpolants, in general, are different. 

The theorem holds under the assumption of existence of 

polynomial preserving extension operators for ‘energy 

spaces’ H!, H(curl), H(div). Existence of such oper- 

ators has been established for 2D spaces and 3D H! 

space, but it remains so far only conjectured for spaces 

HA (curl), H(div) in 3D. 


[5 
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jet temperature 3:445—446 

parallel iteration 2:711-712 

perfectly matched layers 2:696, 2:710 

structural 2:683, 2:684-689 

tensors 2:343--344 

time-harmonic waves 2:697--698 

unbounded domains 2:702—703 


acquired cardiovascular disease 3:528 
active... 


arterial wall mechanics 2:608 
cardiac muscle mechanics 2:621-622 
set strategy 2:202, 2:203, 2:216 


adaptive. .. 


boundary element methods 2:386—390 
computation 1:6 
averaged Navier-Stokes equations 3:193 
convection—diffusion reactions 1:696—699, 1:700 
drag of buff bodies 3:189-191 
efficiency 3:197-198 
error estimation 1:87, 1:97, 1:690-691 
heat equations 1:676, 1:689-692, 1:696-699, 1:714-719 
literature 1:676-678 
nonstiff initial value problems 1:680-683 
parabolic differential equations 1:675-702 
reaction-diffusion equations 1:696—699, 1:700 
reliability 3:197—-198 
software 1:702 
square cylinder drag 3:191-192, 3:193-194, 3:197-198 
stationary benchmark problems 3:201-205 
stiff initial value problems 1:680 
strong stability factors 1:675-679, 1:683, 1:686-689, 
1:691-693 
surface-mounted cube drag 3:192-195, 3:196, 3:197-198 
turbulence 3:199-201 
cross approximations (ACA) 2:731 
direct numerical simulations 3:184-185, 3:191-197 
error control 1:677-—680 
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adaptive. .. (continued) 
finite element methods 
convergence 1:98, 1:103 
forming processes modeling 2:485-491 


antenna integration 3:451 
classical 3:325—326 

complex geometry 3:346-348 
computational flow mechanics 


Akzo-Nobel system 1:680, 1:682, 1:685, 1:687 
ALE see arbitrary Lagrangian—Eulerian 
algebraic. .. 

complements 1:730—731 


meshfree methods 1:280-291 
operators 1:88 

plates 1:207 

Riemann solvers 3:97, 3:98-100 


fracture mechanics 2:386—390 

mesh generation/adaptivity 1:510-516 
hierarchical modeling 2:425 
large eddy simulations 3:184—-185, 3:191-197 
meshes 

coarsening 1:104 

error estimation 1:87, 1:97 

linearized elasticity 2:37-39 

local refinements 1:73, 1:98, 1:99-100 

spatial representation 1:529-530 
partitions 1:171-172 
space-time Galerkin FEM 1:676, 1:689-690 
support stencil reconstructions 1:462—463 
time steps 1:696-697 
wavelet techniques 1:3, 1:157~-195 


aeroacoustics 3:444—448 

code interfaces 3:428 

Euler codes 3:413-416 

flutter 3:448-—449, 3:450 
fundamental studies 3:436-—437 
historical overviews 3:408-413 
large eddy simulations 3:423—426 
mesh generation 3:426-428 
Navier-Stokes code 3:416-418 
numerical codes 3:413-418 
parafoils 3:449-451 

research 3:436-437 

Reynolds averaged turbulence 3:418—423 
shape optimization 3:437-443 
validation 3:428 


expansions 1:143-146 
operators 3:30~32 
polynomial expansions 1:143-145 
solution algorithms 3:175-179 
space discretization 3:166 
algorithms 
Bi-CG 1:559-560 
Bi-CGSTAB 1:559-560, 2:707 
bisection 1:101-102 
Bowyer—Watson 1:501 
buckling stability 2:156 
closest-point-projection equations 2:262-—263 
consistency tangents 2:248 
Constraint Energy Momentum 2:186-187 
contact mechanics 2:197, 2:198, 2:214-221 


shells 1:218 
arbitrary crack growth 2:382—394 
arbitrary Lagrangian—Eulerian (ALE) methods 1:4, 1:413-433 
advection 3:76-77 
aeroelasticity 3:464-—466 
fluid dynamics 1:421, 1:422-426 
kinematics 1:416-417 
mesh-updating 1:420-422 
nonlinear solid mechanics 1:426, 1:428-433 


ship hydrodynamics 3:581-583, 3:585-587, 3:592-593, 3:594-607 


solid mechanics 1:426, 1:428-—433 
arbitrary shape approaches 2:382-394 
arch structures 2:558-561 
Argyris elements 1:77-78, 1:85 
Amold—Winther elements 1:269 


boundary integral equations 1:175-181 
complexity analysis 1:187-195 
eddy viscosity 3:49-50 
evolution equations 1:169-175 
numerical simulations 1:157~-195 
operators 1:175-181 
residual approximations 1:187-195 
adaptivity 1:1 
buckling computation 2:152-153 
computational fluid dynamics 3:2, 3:183~205 
convection-diffusion equations 1:40-43 
convergence 1:103 
discontinuous Galerkin methods 3:101—102, 3:114—-115, 3:116 
error estimates 1:511-516 
linearized elasticity 2:24~40 
Maxwell equations 1:735 
operators 1;160, 1:165, 1:175-181 
quality meshing 1:502-510 
regular meshes 1:502-504 
RKDG 3:114-115, 3:116 
shock capturing 3:101-102 
symmetric BEM/FEM 1:382-389 
addition hierarchical matrices 1:611 aerothermal design 3:433—434 
Additive Schwarz Method (ASM) 1:390-391, 1:623-625, 1:627—629, AETHER 3:416-418 
1:632 affine... 
additive variants 1:589-590 interpolation-equivalent elements 1:80—81 
adhesion 2:205 invariance 1:657—658 
adhesive wear 2:206 mapping 1:58, 1:62-63 
adjacent equilibrium states 2:142 afterbody design 3:451—452 
adjoint methods 2:742-743, 3:381-386, 3:390 AGARD see Aerospace Research and Development : 
admissibility 1:601-602, 1:606, 1:608-609, 2:582 aggregate interlock 2:536—537 ts 
advancing-fronts 1:497-498, 1:500-501, 1:508 aggregate scale 2:520-521, 2:524 
advection AIAA see American Institute of Aerodynamics and Astronauts ; | 


electromagnetism 3:451 
finite difference solvers 3:409-411 
fluid flow discretization 3:359--365 
historical overviews 3:325-329 
industrial 3:407-454 
infrared signatures 3:451—452 
mathematical models 3:330-334, 3:335 
nonlinear aeroelasticity 3:459-477 
planforms 3:392-396 
potential flow methods 3:334-348 
shape optimization 3:379-400 
case studies 3:390~399 
Euler equations 3:383-386 
Navier-Stokes equations 3:383-386 
shock capturing 3:329, 3:348-359, 3:363-364, 3:460 i 
structural optimization 3:392-396 # 
subsonic linearized potential flow 3:334-337 
time-stepping schemes 3:365—-379 
vortex methods 3:145-149 
aeroelasticity 3:459~477 
Aerospace Research and Development (AGARD) 3:377-379, 3:474, 
3:449-450 


continuum mechanics 1:413-414 Arnoldi method 1:556—-557, 1:571-572 
cutting-plane algorithms 2:246 Arrhenius laws 3:517 


kinematic shakedown 2:318-319 aii... 
Newton—Raphson 2:18, 2:471 
QMR 1:559-560, 2:707 
radial return 2:737-738 
Red—Green-Blue refinement closure 1:102—103 
shakedown 2312-321 
visualization 1:531-541 
aliasing errors 3:282, 1:149 
Almansi strain tensors 2:9-10, 1:136 
alternating direct implicit (ADI) methods 1:27—28 
altemating plasticity 2:304, 2:321-322 
Alternating Schwarz Method 1:618, 1:620 
alumina plates 2:379 
aluminum-boron 2:417-421 
aluminum-magnesium 2:648-649 
American Cup Bravo Espafia 3:600, 3:601 
American Cup Rioja de España 3:596-601 
American Institute of Aerodynamics and Astronauts (ALAA) 3:327 
Ampere’s law 1:723-724 
anchor bolts 2;360-361 
aneurysms models 2:617-618 
angioplasty 2:617, 2:618 
angle of attack 3:460, 3:429-431, 3:436, 3:443 
anisotropic. . . 
bound meshes 1:513-514 
damage models 2:338-339 
elastodynamics 2:759-760 
elements 1:57, 1:63-66 
elliptic partial differential equations 1:14 
error estimates 1:81—82 
finite elements 2:20~21 
h-finite elements 1:57, 1:63-66 
hardening plasticity 2:553 
hp-adaptivity refinement 2:38-39 
meshes 1:513-514, 3:114-115, 3:116, 3:163-164 
nodal interpolation error estimates 1:63~66 
Sobolev norms 1:710, 1:711 
turbulence closure 3:311 
annihilating radial terms 2:705 
antenna integration 3:451 
anterior cruciate ligament (ACL) 2:626 
aorto-aorto bypass models 3:539-540 
approximations 
inertial manifolds 3:211—-212 
levels 3:269-270 


aneurysms 2:617~—618 
bifurcations 2:617, 2:618, 3:537-538 
blood flow 3:530 
clamping 2:617 
wall mechanics 2:606-618 
artificial boundaries 2:703-704, 2:708, 3:6-7, 3:159-160 
artificial diffusion 3:349~351, 3:583 
artificially cemented sand 2:555-558, 2:559 
ASM see Additive Schwarz Method 
associated plasticity 2:237-239 
assumptions on the norms 1:247—248 
Astley—Leis conjugated test functions 2:709 
asymmetrical mechanical part forming 1:518, 1:519 
asymptotic. .. 
consistency 1:207, 1:218-219 
decomposition preconditioners 1:619 
error estimates 1:370-~371 
exactness 1:86 
expansions 1:201-207, 1:211-218 
atomistic methods 2:383, 2:391-394 
attachment points 1:536 
attraction basins 1:653-654 
attributes, geometric modeling 1:490-492 
Aubin—Nitsche lemma 1:364—365 
augmented. . . 
closest-point-projections 2:262-—263 
consistency functions 2:261-262 
dual functions 2:261-262 
functionals 1:267 
Lagrangian formulations 
contact mechanics 2:202, 2:217-220 
discontinuous deformations 1:329 
elastoplastic deformations 2:260—263 
multibody contact forces 1:322, 1:329 
viscoplastic deformations 2:260—263 
austenitic steel 2:649, 2:650 
auto-catalytic reactions 1:698-699, 1:700 
autogenous shrinkage 2:514, 2:516-529 
autoignition 3:511 
automatic. .. 
aerodynamic shape optimization 3:437—443 
coarsening 1:104 
differentiation 3:442 


arbitrary Lagrangian--Eulerian methods 3:76-77 air conditioning 3:452-453 
diffusion equations 3:22-26, 3:32-40, 3:548 air flow 2:584-—586 = 
diffusion operators 3:18 Air Force Office, Scientific Research 3:473~477 | 
equations 1:148—150 airbrake efficiency 3:432—433 | 
hp-finite methods 3:62 aircraft see aerodynamics : 
shallow water equations 3:250-251 airfoils 

AERO simulation platforms 3:473-477 aeroelasticity 3:460 

aeroacoustics 3:444—448 complex geometry 3:346 x 

aerodynamics 3:2~—3, 3:325-400, 3:407-454 compressible flows 3:77-81 
aeroacoustics 3:444~448 NACA 6 series 3:326-327 i 
aeroelasticity 3:459-477 NACA 0012 3:78-81, 3101-103, 3:122-123 
afterbody design 3:451-452 NACA 0015 3:77-78 
air conditioning 3:452—453 airframe noise 3:444 
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automatic. . . (continued) 
mesh moving techniques 3:554 
remeshing 2:362-363 
automotive headlamp panels 2:496, 2:497 
average internal work 2:446-449 
average strain 2:410, 2:446 
average stress 2:410, 2:446 
averaged Navier-Stokes equations 3:193 
averaging error estimators 1:93-97, 2:32-33, 2:52-53 
aviation see aerodynamics 
AVS development environment 1:544~545 
axial thrusts 2:559-560, 2:561 
axially loaded bars 2:358~359 
axisymmetric. . . 
elasticity 2:15~16 
flanged components 1:431-432 
necking 2:651—652, 2:653-654 
piercing 2:497, 2:498 
soil layers 2581-582 


B-bar methods 2:122-123, 2:476 
B-differential equation systems 2:220 
B-FOIST see Beyond-Foist 
B-spline wavelets 1;:480-481 
Babuška paradox 1:220 
Babuka—Brezzi (BB) condition 
contact discretizations 2:207 
linearized elasticity 2:22 
partial differential equations 1:295 
Signorini-type interfaces 1:400-401 
space discretization 3:161-162 
symmetric BEM/FEM 1:381--382 
see also inf—condition 
backfill soil 2:558-561 
backscatter 3:285 
backward... 
Euler method 1:676, 1:677, 1:715-717, 3:369 
extrusion 2:500-502 
finite differencing 1:36-37, 1:50, 1:51 
rescaling 3:293 
time, central space finite differencing 1:51 
balance equations 
combustion 3:505, 3:520, 3:521 
elasticity 2:10-13, 2:229~231 
material responses 2:424 
turbulent flame combustion 3:520, 3:521 
viscoelastic fluid flows 3:482 
viscoplastic deformations 2:229-231 
balancing Neumann—Neumann preconditioners 1:640-641 
balloon angioplasty 2:617, 2:618 
bar scales 2:529-530, 2:531-533 
barrier functions 1:11, 1:15-17 
barrier method 2:202, 2:218 
bars 
linear elastic 1:128~130 
reliability methods 2:676—677 
tensile 2:345-346, 2:347 
uniaxial tension 2:344--345, 2:346~349, 2:351 
base vectors 2:8-9, 2:65, 2:91 
basis functions 3:108 
basis transform approximations 1:605-606 
BB see BabuSka—Brezzi 
BCF see Brownian configuration fields 
BDM see Brezzi—Dougtas—Marini 


beams 
Bernoulli cantilever 2:20—21 
double cantilever 2:367-368 
elastodynamics 2:767—768 
membrane locking 2:123-124 
shakedown 2:293 
three-point bending 2:364—365, 2:366-367, 2:369 
bearing capacities 2:550-551 
Bell elements 1:78 
BEM see boundary element method 
bending 
arterial walls 2:609-613, 2:614 
displacements 1:202, 1:206~207 
dominated action 2:68-70 
energy 1:209, 1:215 
four-point 2:39-40 
generators 1:204—205, 1:206-~207 
moments 2:559--560, 2:561 
operators 1:214 
three-point 2:364-365, 2:366-367, 2:369 
Berea sandstone soil consolidation 2:583-584 
Bernoulli cantilever beams 2:20--21 
Bemstein estimates 1:164, 1:167 
Bernstein polynomials 2:214 
Besov space 1:188, 1:192 
Bessel’s differential equation 2:704 
best N-term approximation 1:188—190 
Betti-Somigliana representation 1:341, 1:344 
Bettis~Burnett unconjugated test function 2:709 
Beyond-Foist (B-FOIST) 3:559 
Bézier polynomials 2:213-214 
BGT operators 2:706 
Bi-CG algorithms 1:559-560 
Bi-CGSTAB algorithms 1:559-560, 2:707 
Bi-Lanczos method 1:573-574 
BIE see boundary integral equations 
Bierhduselburg tunnel 2:528-529 
bifurcation 1:6 
blood flow 3:537-538 
` buckling 2:142, 2:145~152, 2:156-164 
finite element models 2:617, 2:618 
parameterized systems 1:669-673 
turbulence closure 3:314 
biharmonic operators 1:34 
biological tissue 2:605-629 
biomechanics 2:605-629, 3:527-540 
biorthogonal wavelets 1:162—163 
Biot stress tensors 2:10 
Biot—Savart integrals 3:131-132, 3:141 
birefringence patterns 3:487—488 
bisection algorithms 1:101~102 
blankholders 2:495 
blanking 2:504—-506 
blast waves 3:115, 3:116 
Blatz—Ko materials 2:14 
blending 
functions 3:305, 3:309, 1:124—125 
meshfree methods 1:303-306 
p-finite element method 1:125-126 
block... 
cluster trees 1:608 
deformability 1:324-329, 1:334-335 
Gauss-Seidel] smoothers 3:178-179 
Jacobi preconditioners 1:567 
partitioning 1:608—609 
preconditioners 1:566-567 
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blood flow 3:3, 3:527-540 
lumped parameter models 3:530-531 
pressure 3:528~540 
three-dimensional equations 3:533-540 
velocity 3:528--540 
wave propagation 3:531-533 
body forces 2:726, 3:319-320 
body-fitted meshes 3:359 
Boeing 747 
moving boundaries meshing 1:517, 1:519 
potential flow methods 3:336~337 
wing redesign 3:390-392, 3:393, 3:396 
Bogner—Fox—Schmit elements 1:78-80 
bond slip 2:531 
bond stresses 2:530-531 
bonded particle assemblies 1:331 
bonding 2:439, 2:554 
Boolean values/operations 1:476—477, 1:483—484, 1:492 
boron 2:417-421 
bound limit theorems 2:549-551 
boundary conditions 
acoustic field equations 2:697 
advective-diffusive equations 3:33-34, 3:38-40 
arbitrary Lagrangian—Eulerian 1:423—425 
composite laminates 2:441—445, 2:447-449 
contact mechanics 2:196 
continuous blending meshfree methods 1:305-306 
Dirichlet-to-Neumann 2:696, 2:703-708, 2:711 
elliptic partial differential equations 1:15 
essential 1:293-298, 1:299, 1:305-306 
fluid flow discretization 3:361-362 
large eddy simulations 3:291-293, 3:296 
Maxwell equations 1:726 
multibody contacts 1:321-324 
Navier-Stokes equations 3:209-210, 3:220~221 
partial differential equations 1:293—-298, 1:299 
plates 1:207 
RKDG 3:110-115, 3:116, 3:117 
turbulence closure 3:305-306 
boundary element method (BEM) 1:4, 1:339--371, 1:375-409 
boundary integral equations 1:346-347 
constrained shapes 2:381 
domain 2:735-740 
dual reciprocity 2:758-760, 2:761—762, 2:763 
elasticity 2:723-731 
. flows with solid boundaries 3:145-149 
fracture mechanics 2:381, 2:732-735 
geomechanics 2:545-546, 2:547, 2:564 
hierarchical matrices 1:610, 1:612, 1:614-615 
multigrid methods 1:593-595 
panel clustering 1:597-600, 1:607, 1:610, 1:612, 1:614-615 
ship hydrodynamics 3:580 
Sobolev index 1:366-371 
soil consolidation 2:564 
unbounded domains 2:702 
variational formulations 1:347-358 
viscoelastic dynamic analysis 2:751—758 
see also finite element method 
boundary element space discretization 1:599-600 
boundary energy 2:744 
boundary integral equations (BIE) 1:340-347 
aerodynamics 3:334~337 
elasticity 23719-7745 
Fourier transforms 1:704, 1:713-714 
Laplace transforms 1:704, 1:713~714 


matrix compression 1:175—181 
Maxwell equations 1:735 
operators 1:342~344 
plasticity 2:719-745 
time-dependent problems 1:6, 1:703-719 
time-stepping methods 1:704—705, 1:714—-719 
variational formulations 1:347-358 
weak solution 1:347-358 
boundary layers 1:205-206, 1:213-214, 1:216, 1:223~-224 
boundary representation schemes 1:477-485, 1:487-—489 
boundary stresses 2:727 
boundary value problems (BVP) 1:339-371 
arterial walls 2:611-612 
boundary integral equations 1:340-342 
Maxwell equations 1:734~735 
nonlinear parabolic equations 1:24-26 
bounded domains 1:510~516, 3:10-11 
bounded extension splitting 1:637-638 
Boussinesq assumption 3:421—423 
Bowyer~Watson algorithm 1:501 
Box/Top Hat filters 3:273-274 
brain 1:520-521 
branching crack topology 2:388 
branching discontinuities 1:302 
Bray—Moss-Libby analysis 3:521-523 
breaking waves 3:581 
Brent method 1:671 
Brezzi—Douglas—Marini (BDM) elements 1:259~260, 1:261 
brick-functions 1:395 
bridge scaling 1:303 
Brownian configuration fields (BCF) 3:491, 3:493--494, 3:495-496 
Brownian dynamic simulations 3:490-491 
Brownian motion 3:137~-139 
Broyden iteration 1:657 
bubble functions 
hierarchical p-refinement 3:18-20 
Maxwell equations 1:730 
residual error estimates 1:89 
space-time 3:30 
Stokes equations 1:147 
Bubnov—Galerkin weak form 1:292-293 
buckling 2:139-164 
continuation computation 2:150—164 
delamination 2:367-368 
perturbation theory 2:147—149 
shells 2:70 
stability 2:142-144, 2:145-149, 2:150-164 
buff body drag 3:189~-191 
buffeting 3:459 
bulk forming operations 2:496-502 
bulk moduli 2:409, 2:419 
bump maps 1:491 
Burger’s equation 3:100—102 
buried structures 2:558-561 
Bumett test function 2:709-710 
business jets 3:398—399, 3:427, 3:431-433, 3:442—-443 
BVP see boundary value problems 


cabin air conditioning 3:452—453 

CAGD see computer aided geometric design 

calcium-silicate-hydrates (CSH) 2:517, 2:520—521 

Calculation of Non-Newtonian Flow: Finite Elements & Stochastic 
Simulation Technique (CONFFESSIT) 3:490-491 

camera models 1:528 

cancellation properties 1:166 
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cannon-blast simulations 3:115, 3:116 
canonical prolongations 1:587, 1:588-589 
canonical restrictions 1:587 
cantilever beams 2:20-21, 2:367-368 
capillaries 2:519-521, 2:524—525, 3:530 
capsule implosion simulation 3:115, 3:117 
car seats 1:517, 1:519 
Caradonna—Tung rotor 3:104 
carbonate soils 2:555-558, 2:559 
cardiovascular system 3:527-540 
carotid artery 2:613, 2:614, 3:537-538 
carpet plots 1:532-533 
Cartesian coordinates 3:64—65, 3:277-—278, 1:145-146 
cascadic wavelet transforms 1:161—162 
case tables 1:532, 1:533 
Cauchy... 
Green tensors 2:9-10 
integral 1:718-719 
principal value integrals 1:343 
Schwarz inequality 2:31-32, 2:33-36 
singular integral equation (CSIE) 1:344--345 
stress 2:613-615, 2:617 
stress tensors 1:418, 2:66, 2:456, 1:136 
causal Green’s function 3:29-30 
cavity flows 3:209, 3:223-225 
cavity shape sensitivity 2:743 
Cayley—Hamilton theorem 3:312 
CBS see characteristic based split methods 
CDC see Control Data Corporation 
CDM see continuum damage mechanics 
Céa’s lemma 1:75-76, 1:358-362 
CEBE see clustered-element-by-element 
CEI EnSight Gold application 1:546 
cells 
averages 1:171-172 
centered finite volumes 1:442, 1:467 
integration 1:293 
visualization 1:528-529, 1:530-531 
CEMA see Constraint Energy Momentum Algorithm 
cement paste scale 2:517—521, 2:522—524 
cemented sand 2:555—558, 2:559 
centered moving least squares 1:288—289 
central circular holes plates 2:326—-327 
central differencing 
block deformability 1:326 
convection—diffusion equations 1:37 
finite differencing 1:50 
method of characteristics 1:47-50 
potential flow equation 3:339~340, 3:342—-343 
time integration 2:183-184 
wave equation 1:30-31 
centrifugation 3:320 
CED see computational fluid dynamics 
CFL condition 1:33, 3:247 
CFM see computational fracture mechanics 
CG see conjugate gradients 
channel flows 3:209-210, 3:212~213, 3:221—-223, 3:227~-228 
characteristic. . . 
aerodynamic schemes 3:358-359 
based split (CBS) method 2:590-—591, 3:583 
bifurcations 1:669-670 
Galerkin method 3:187, 3:188 
length parameters 3:592-593 
splitting 3:357 
variables 3:109-110 


charge density 1:724 
Chebyshev polynomials 1:143-144 
chemical reactions 1:680, 1:682 
Cholesky decompositions 1:554-555, 1:562, 1:565 
Chorin projection scheme 3:168-170 
Christoffel symbols 1:212 
civil aircraft 3:431-433, 3:434 
clamped. . . 
elliptic shells 1:212-215 
hemispherical shells 2:69-70 
plates 1:224, 2:122 
spherical shells 1:224-225, 1:228 
clamping 2:617 
classical 
aerodynamics 3:325-326 
boundary integral equations 1:175—-176 
laminate plate theory (CLPT) 2:433-434 
plasticity 2:552 
shakedown 2:296-299 
classifications 
convergence 1:126—127 
shells 1:215 
Sobolev index 1:366—367 
Clausius—Duhem inequality 2;12-13, 2:466 
Clausius~Planck inequality 2:13, 2:272 
clearance 2:505-—506 
Clément operator 1:59-60, 1:66-68 
clipping 1:540 
closed-die rail forging 2:497-500 
closed-form constitutive equations 3:490-—491 
closest-point-projection (CPPM) equations 2:228—229, 2:244-246, 
2:249-259, 2:262—263 
closures 
filtered Navier-Stokes equation 3:32-33 
mesh refining 1:102—103 
Smagorinsky eddy viscosity 3:43-44 
turbulence 3:301-322 
clouds 1:291-292 
CLPT see classical laminate plate theory 
clusters 
bifurcations 2:160 
clustered-element-by-element (CEBE) 3:566 
hierarchical matrices 1:608 
multipole expansions 3:142-144 
panel clustering 1:5-6, 1:597-615, 2:731 
trees 1:601-602, 1:604, 1:606, 1:608 
coarse grids 
corrections 1:578, 1:582, 1:584—585, 1:588—589 
dynamic multilevel methods 3:233 
matrices 1:588 
coarse scales 3:11-27 
coarsening 
adaptive mesh design 1:104 
adaptive wavelets 1:193 
fiow field compression 1:170 
local mesh refinement 1:98, 1:104 
coated steel sheets 2:474-476 
coaxial supersonic jets 3:447 
code interfaces 3:428 
coefficients 
drag 3:101-103, 3:122-123, 3:146-147, 3:174-175 
force 3:373 
Fourier 3:243-244 
lift 3:84-85, 3:146-147 
positive 1:456-457, 1:463-464, 3:349-356 
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predictions 1:189, 1:191~193 
pressure 3:101, 3:103, 3:435 
of variation (COV) 2:567—568, 2:662-663 
coercivity 3:33 
cohesive crack propagation 2:379 
cohesive-frictional soils 2:550-551 
cohesive-zone models 2:337, 2:349-355, 2:363-367 
coining process 1:431, 1:433 
cold jets 3:446-447 
collapse 2:150, 2:160, 2:163, 2:537~538 
collapsed Cartesian coordinates 3:64-65, 1:145-146 
collocation 
algebraic polynomial expansions 1:143~145 
boundary element method 2:723-727, 2:732-733 
boundary integral equations 1:346, 1:704, 1:712-713 
discretization 1:292, 1:600 
fracture mechanics 2:732-733 
partial differential equations 1:292 
space-time boundary integrals 1;712-713 
time integration 2:184 
color mapping 1:531~532, 1:533-534 
combustion 3:3, 3:499-523 
governing equations 3:504—507 
instable flames 3:500-~501, 3:503-504 
laminar flames 3:500-501 
mixture fractions 3:508-511 
nonpremixed flames 3:500~501, 3:502 
overall reaction terminology 3:507-508 
partially premixed flames 3:500—501, 3:503 
premixed flames 3:500-502 
reaction terms 3:507 
regimes 3:500—504 
stable flames 3:500-501, 3:503~504 
stoichiometry 3:507-508 
turbulent flamcs 3:500—501, 3:514—523 
compact-tension (CT) 2:361-362 
compaction 1:431-433 
compatibility conditions 3:118, 3:468-471 
complementary. .. 
Dirichlet energy principle 2:24 
energy functions 2:251 
operators 1:229 
partitioning 2:422 
completely discretized schemes 1:31-32 
complex... 
geometries 3:61-88, 3:346-348 
multidimensional domains 3:359-365 
shapes 1:475-494 
viscoelastic fluid flows 3:489-—490 
complexity 
adaptive wavelets 1:187-195 
panel clustering techniques 1:603—604 
composite laminates 2:431-458 
see also fiber-reinforced. .. 
composite matrices 1:256—257 
compound bifurcation points 2:159-164 
compressed foam-rubber tubes 2:53-54 
compressed matrices 1:175—-181 
compressible. .. 
Euler equations 3:101, 3:103-105, 3:383-386 
flows 3:2-3 
discontinuous Galerkin methods 3:74—76, 3:77~81, 3:122-123 
moving domains 3:76~77 
plasma 3:81-85 
shock capturing 3:551-553 
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stabilization parameters 3:551~-553 
vortex particle methods 3:149-150 
see also turbulence. . . 
hyperelastic materials 2:13-14 
Navier-Stokes equations 3:122~123 
partition solution procedures 2:582 
strain-energy functions 2:13~16 
turbulence closure 3:302 
compression 
failure 2:438 
flow fields 1:169-175 
geomechanics 2:555-556 
compressive loadings 2:352, 2:354-355, 2:535—536 
computability 
computational fluid dynamics 3:2, 3:183~205 
direct numerical simulations 3:184—185, 3:191~-197 
large eddy simulations 3:184-185, 3:191-197 
computational. .. 
aerodynamics 
Euler equations 3:327, 3:329 
Navier-Stokes equations 3:327, 3:329 
potential flow methods 3:334--348 
transonic flow 3:326-327 
aeroelasticity 3:459-477 
biomechanics 2:605—629 
complexity 3:87-88 
contact mechanics 2:195-221 


costs 3:389—390, 1:132 


discrete element code 1:316 
flow mechanics 
aeroacoustics 3:444-448 
aerodynamics 3:407-454 
code interfaces 3:428 
Euler codes 3:413-416 
flutter analysis 3:448—449, 3:450 
fundamental studies 3:436-437 
historical overviews 3:408—413 
large eddy simulations 3:423-426 
mesh generation 3:426—428 
Navier-Stokes code 3:416-418 
numerical codes 3:413-418 
parafoils 3:449~451 
research 3:436-437 
Reynolds averaged turbulence 3:418--423 
shape optimization 3:437~443 
validation 3:428 
Dassault Aviation 3:407-454 
military aircraft 3:428—431 
fiuid dynamics (CFD) 
adaptivity 3:2, 3:183-205 
computability 3:2, 3:183-205 
discontinuous Galerkin methods 3:91-—123 
mesh generation/adaptivity 1:497—521 
nonlinear aeroelasticity 3:459-477 
RKDG 3:97, 3:104-115, 3:116, 33117 
ship hydrodynamics 3:580 
transonic flow 3:327 
turbulence closure 3:301~322 
forming processes modeling 2:461-507 
fracture mechanics (CFM) 2:375-402 
cracking process 2:376-377 
geometrical representation 2:378--394 
nongeometric representations 2:394—400 
geomechanics 2:543-569 
goal-oriented error estimators 1:98 
grids 1:547~548 
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computational (continued) 
homogenization 2:413~414 
interfacing visualization systems 1:546-548 
nonlinear aeroelasticity 3:459-477 
theology 3:481, 3:487—488, 3:489 
visualization 1:5, 1:525-548 
computer aided geometric design (CAGD) 1:667 
concrete 2:513-539 
autogenous shrinkage 2:514, 2:516~529 
cooling tower analysis 2:529-538 
embedded anchor bolts 2:360-361 
tension stiffening 2:529-538 
under fire 2:576, 2:598-599 
condition numbers 1:365-~366 
Galerkin methods 1:365--366, 3:119-120 
Jocal convergence theory 1:655-656 
Signorini-type interfaces methods 1:398-399 
thin-walled structures 2:125 
CONFFESSITI see Calculation of Non-Newtonian Flow: Finite 
Elements & Stochastic Simulation Technique 
configuration field evolution 3:493 
conforming finite elements 1:73, 1:77-85 
conforming Galerkin method 1:177 
conjugate gradients (CG) 
acceleration 1:592 
direct linear algebraic solvers 1:557-558 
material responses 2:416-—417 
nonlinear parabolic equations 1:26 
preconditioned 1:642--643, 1:147 
conjugated test functions 2:709-710 
connectivity, visualization 1:541 
conservation 
of energy 1:30, 1:419-420, 1:426, 2:11-13 
equations 
advective-diffusive 3:33, 3:34 
arbitrary Lagrangian—Eulerian 1:419-420 
exact potential flow 3:343-345 
fluid flow 3:330-331 
reacting flows 3:505, 3:514 
Godunov finite volume discretizations 1:443 
laws 
adaptive wavelets 1:169 
arbitrary Lagrangian—Eulerian 1:424—425 
finite volumes 1:439-450, 1:464-470 
geometric 1:424—425, 3:466-468 
nonlinear 1:439—450, 1:468-470, 3:91, 3:96~115, 1:150 
nonlinear aeroelasticity 3:466-468 
scalar 1:439~450, 3:97--98, 3:100-108, 3:349-351 
shock capturing 3:356 
spectral methods 1:148-150 
of linear momentum 2:11 
of mass 2:10 
time integration 2:185-187 
conservative. .. 
derivatives 3:129 
difference equations 3:341-342 
discontinuous Galerkin methods 3:92, 3:119 
consistency 
advective-diffusive equations 3:34, 3:35, 3:36 
continuous moving least squares 1:286 
discontinuous Galerkin methods 3:119 
errors 1:106 
Godunov finite volume discretizations 1:443 
moving least squares 1:286, 1:289 
small strain elastoplasticity 2:738 


Sobolev index 1:367-368 
window functions 1:283 
consistent integration, plastic flow 2:274-275 
consistent tangent operators (CTO) 2:736—738 
consolidation, geomechanics 2:543, 2:561-567 
constant mass density 2:684—685 
constant total enthalpy 3:358-359 
constitutive. . . 
behavior, composite laminates 2:449—451 
equations 
arbitrary Lagrangian—Eulerian 1:428-429 
inelastic materials 2:638—-640 
inverse 2:640~641 
material parameter identification 2:637-654 
shell theory 2:95-98 
viscoelastic direct boundary elements 2:752-753 
viscoelastic fluid flows 3:481~482, 3:490-491, 3:493-494 
integration 2:736-740 
laws 2:310, 2:204-206, 2:621, 2:624 
models 
arterial wall mechanics 2:607-609 
geomaterials 2:551-558, 2:559 
heart wall mechanics 2:619-622 
ligaments 2:626—-629 
material frame indifference 2:239-240, 2:241 
single crystal plasticity 2:273 
nongeometric representations 2:394—399 
tensors 2:13-14 
constrained Delaunay kernel 1:508 
constrained shape methods 2:378-394 
constraint. . . 
equations 2:200 
impositions 1:321-324 
solution methods 2:197 
Constraint Energy Momentum Algorithm (CEMA) 2:186-187 
constructive solid geometry (CSG) 1:483~485 
contact. .. 
conditions regularization 1:314~316 
constraint impositions 1:321-324 
detection 1:317-321, 2:215 
discontinuity approximation 3:111, 3:112 
discretizations 2:206-214 
force evaluation 4:316, 1:317, 1:321-324 
friction modeling 2:471-476 
mechanics 2:195-221 
algorithms 2:197, 2:198, 2:214-221 
arbitrary Lagrangian—Eulerian 1:431, 1:433 
contact discretizations 2:206—214 
continuum description 2:198—206 
detection 1:317~321, 2:215-216 
plane orientation 1:322-323 
resolution phase 1:318~321 
segments 2:209-210 
continuation principle 2:155 
continuity 
bifurcations 1:670-671 
buckling 2:150-164 
equations 1:726, 3:302, 3:505 
meshfree methods 1:287, 1:288, 1:289 
parameterized nonlinear equations 1:664—666 
projection-based interpolation 1:733 
continuous. .. 
blending 1:303-306 
boundary integral operators 1:351 
conservation laws 1:424—425 
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discretization 2:664-665 
interpolation 3:484 
moving least squares 1;285~286 
pressure interpolation 1:262-264 
yielding plasticity 2:552-553 
continuum. .. 
constitutive modeling 2:462-469 
contact mechanics 2:198-206 
damage mechanics (CDM) 2:355~362, 2:440—441, 2452-458 
failure modeling 2:335-336 
mesh sensitivity 2:341-349 
Ppartition-of-unity 2:369 
mechanics 
arbitrary Lagrangian—Eulerian 1:413-433 
Eulerian algorithms 1:413-414 
Lagrangian algorithms 1:413~414 
thin-walled structures 2:64~-68 
micromechanics 2:515-519 
slip theory 2:267—268, 2:269—274 
contour dynamics 3:131, 3:132-133 
contours of vorticity curl 3:131, 3:132-133 
contraction flows 3:487-488 
contraction principle 1:441, 1:447 
contravariant base vectors 2:65 
Control Data Corporation (CDC) 3:327 
control theory 3:381-383 
control volumes 1:458, 1:460-461 
convection 
arbitrary Lagrangian—Eulerian 1:429, 1:430-431 
convection—diffusion 
discontinuous Galerkin methods 3:120—123 
equations 1:36-50, 1:696—700, 3:149 
turbulence 3:196 
convective gradient projections 3:586 
convective velocity 1:417 
discontinuous Galerkin methods 3:91, 3:115, 3:120-123, 3:196 
shallow water equations 3:242 
Conventional Serial Staggered (CSS) time integrators 3:471-473 
convergence 
acceleration 3:339, 3:365-379 
adaptive finite element methods 1:98, 1:103 
advective-diffusive equations 3:36 
buckling computation 2:154—-155 
characteristics method 1:126-131 
conservation laws 1:446-450, 1:467—468, 3:97, 3:98-104 
contact mechanics 2:197~198, 2:219 
Dirichlet-to-Neumann boundary condition 2:707 
discontinuous Galerkin methods 3:96, 3:97 
domain decomposition 1:627-630 
eigenvalues 1:570 
factors 1:652-653 
Koiter shell theory 1:217 
meshfree methods 1:287 
multigrid methods 1:583~584 
optimality orders 1:362-364 
orders 1:652-653 
p-finite element method 1:126—131 
rate optimality 1:207 
rates 1:185~187 
RKDG 3:110 
shakedown 2:316 
shells 13214-215 
Sobolev index 1:367-368 
symmetric BEM/FEM 1:380-382 
theory 1:649-669 


converse piezoelectric effect 2:761 
convolution quadrature 2:754, 2:756-—757 
cooling tower analysis 2:529-538 
coordinates 
Cartesian 3:64-65, 3:277-278, 1:145-146 
mesh filtering 3:278 
plates 1:199--200, 1:202 
polynomial expansions 3:64-65, 3:66 
shells 1:199-200 
transformations 1:535 
copper crystals 2:283-284, 2:286-287 
Coriolis terms 3:242, 3:251 
cork of separation 2:350-351 
corner singularities 3:157—158 
corner-to-corner contacts resolution 1:318, 1:320~321, 1:322-323 
corrected smooth particle hydrodynamics (CSPH) 1:291 
correctors 2:153-156, 3:414 
corrosive wear 2:206 
Cosserat continuum 2:355-356 
Cosserat surfaces 2:76 
cost functions 3:439—440 
costs 
aerodynamic shape optimization 3:389-390 
structural dynamics 2:188-189 
Coulomb friction 1:315, 1:328, 1:433, 2:205-206 
coupled 
boundary element/finite elements 1:375~409, 2:21-23, 2:547 
elastoplasticity 2:744—-745 
fast solvers 1:389-394 
geomechanics 2:547 
least squares 1:394-396 
Signorini-type interfaces 1:376~377, 1:396--403 
symmetric coupling 1:377-389 
variational formulation 1:398—403, 1:405-408 
damage-plasticity models 2:340-341 
direct numerical simulations 3:184-185, 3:191-198 
discretization error estimation 2:40-46 
finite elements/boundary elements 1:375—409, 2:21-23, 2:547, 
2:744-745 
finite elements/discrete elements 1:325-326, 1:334-335 
finite elements/meshfree methods 1:293, 4:303-306 
flnid/finid-mesh/structure time integrators 3:471 -473 
free-surface equations 3:580 
gradient plasticity-damage models 2:359~360 
large eddy simulation 3:184—185, 3:191- 198 
nonlinear aeroelasticity 3:459-466 
overlapping domains 2:575—-599 
a posteriori error estimates 2:40--46 
see also dual formulations 
course staggered meshes 3:223-225 
COV see coefficient of variation 
covariance 
base vectors 2:65, 2:91 
constitutive equations 2:646, 2:647 
derivatives 1:211-212 
random variables 2:659 
covering admissibility 1:601—602. 1:606 
crack... 
closure effect 2:483 
concrete mechanics 2:535—538 
discontinuous meshfree methods 1:300-303 
growth 
arbitrary 2:382~394 
boundary element method 2:734—735 
conceptual models 2:377 
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tensors 1:211-212 Darcy’s law 2:582 DEM see diffuse element method; discrete element methods 
thickness locking 2:121, 2:124-125 Dassault Aviation 3:407-454 density 
unit meshing 1:504-507 data... airfoils 3:101, 3:103 
simulation 2:377 curved... extraction 1:538-541 constant mass 2:684—685 
inelastic bodies 2:311 boundary conditions 3:114 fields 1:528-529, 1:530-531 free charge 1:724 
meshfree methods 1:280 boundary difference approximations 1:13 J fiow 1:543 mesh 2:417, 2:418 
mouth sliding displacements 2:397 domains 1:109~111 i forms 1:528-531 normalized normal 2:659—660 
opening modes 2:350-351 segment discretization 1:505 interpolation 1:531 probability density function 2:567-568, 3:520-521 
plates 2:39-40 CUSP schemes 3:358-359 streaming 1:543 profiles 1:431, 1:432~433 
process representation 2:376-377 cutting, visualization 1:540 transfer 1:547-548 Deny-Lions lemma 1:60-61, 1:67 
propagation 2:352, 2:354—355, 2:366-369 cutting-plane algorithms 2:246 visualization 1:525—526, 1:528-531, 1:538-541, 1:543, 1:547-548 dependent stress 2:608 
shape sensitivity 2:743 cyclic loadings 2:626—-627, 2:648-649 dataset attributes, visualization 1:528—529, 1:530-531 derivatives 
cranium bone 1:520-521 CYCLONE 2:320 i i Daubechie, I. 1:162 conservative 3:129 
Crank-Nicolson finite differencing 1:51 cylinders DCDD see discontinuity capturing directional dissipation constitutive equations 2:650-651 


crack. .. (continued) 
constrained shape methods 2:378-394 


resistance 2:396—399 


creep 2:526 
crew rescue vehicles (CRW) 3:434—435 
crew transfer vehicles (CTV) 3:434—435 
critical... 
conditions 2:145 
loads 2:159-160 
points 1:453, 1:536—537, 2:156-158 
slip resistance 2:277 
states 2:139, 2:144—149, 2:156-159, 2:161-164 
Cross. .. 
constraint method 2:202 
diffusion 3:309 
fiows 1:425-426, 3:487~-488 
stress tensors 3:274 
Crouzeix—Raviart elements 1:264-266, 1:270-272, 1:465-466, 1:589 
cruise missiles 3:429, 3:430 
CRW see crew rescue vehicles 
cryogenic wind-tunnels 3:431 
crystal lattices 2:339-340 
crystal plasticity 2:267-287 
continuum slip theory 2:267, 2:269-274 
elastic—plastic tangent moduli 2:279-280 
free energy 2:271-272 
heterogeneous polycrystallines 2:281, 2:282-283, 2:286-287 
homogenization 2:269-271, 2:281-282, 2:286~287 
numerical examples 2:283--287 
polycrystallines 2:267—287 
shearing 2:283-284 
updating 2:274—280 
variational formulations 2:280-283 
crystal sheets 2:284--286 
crystal strips 2:284 
CSG see constructive solid geometry 
CSH see calcium-silicate-hydrates 
CSIE see Cauchy singular integral equation 
CSPH see corrected smooth particle hydrodynamics 
CSS see Conventional Serial Staggered 
CT see compact-tension 
CTO see consistent tangent operators 
CTV see crew transfer vehicles 
cube within water 3:604-607 
cubic spline window functions 1:281, 1:283~-284 
culling 1:542 
cumulative pore-size distributions 2:519 
curl operator 1:725, 1:727-734 
currents 
configuration 2:64-66, 2:76--77, 2:81 
Maxwell equations 1:726 
curvature 
mesh adaptivity 1:515-516 
shell theory 2:93-95 


compressible flow 3:122~123 

drag 3:191-192, 3:193-194, 3:197-198 

flow past 3:122-123, 3:201~-205, 3:487—488, 3:494-496 
laminar flow 3:201~-205 

plasma flow 3:84-85 


cylindrical. ., 


billet upsetting 2:479 
heat sources 2:588-589 
shells 
bending 2:84~102 
buckling 2:163 
eigen-frequencies 1:224, 1:226-229 
inflation 2:84—102 
intersections 1:133 
membrane locking 2:124, 2:125 
window functions 1:282 


D/BEM see domain boundary element method 
Dahlquist barrier 2:178 
damage 


composite laminates 2:431—458 

concrete under fire 2:598 

energy release rates 2:456~458, 2:482 

evolution 2:453 

loading functions 2:336 

mechanics 2:336-349, 2:440-441, 2:456-458 
anisotropic 2:338-339 
continuum models 2:355-362 
coupled damage-plasticity 2:340-341 
discrete failure models 2:362—370 
isotropic 2:336-338 
material instabilities 2341-349 
mesh sensitivity 2:341, 2:344—349 
microplane 2:339-340 
partition-of-unity 2:365—369 

shakedown 2:311 

surfaces 2:453, 2:456-457 

tensors 2:453-454 

thresholds solids 2:482 

variables 2:452~453 


Damkéhler numbers 3:521 
damping 


buckling 2:163 

deformation analysis 1:332—333 
Jacobi iteration 1:581~583 
Jacobi smoother 3:178 
nonlinear systems 1:659-660 
ratios 2:179 

shallow water equations 3:241 
viscosity 3:305-306 


dams 2:592, 2:593, 2:594 


DD see domain decomposition 
DDA see discontinuous deformation analysis 
DDM see direct differentiation method 
de Rham diagrams 1:729, 1:732-734 
decimation 1:541 
decohesion relations 2:350 
decomposition 
Cholesky 1:554~555, 1:562, 1:565 
dynamic multilevel methods 3:219-228 
Germano’s 3:274—275 
hierarchical two-level 1:385--386 
large-scale 3:219-228 
Leonard’s 33274 
material responses 2:425 
nonoverlapping domains 1:91, 1:618-620, 1:633-644, 2:712 
overlapping domains 1:617-620, 1:630-633 
triple 3:293 
see also domain... 
deconvolution subgrid-scale modeling 3:287-288 
deep drawing 2:284—-286, 2:496, 2:497 
defect correction 1:40—43, 3:20 
deformation 
arterial walls 2:608 
blankholders 2:495 
body collisions 1:314-335 
clamped elliptic shells 1:213 
deforming-spatial-domain stabilized spacetime (DSD/SST) 3:549, 
3:555-557, 3:571-574 
elastic bodies 2:7-9 
fields method (DFM) 3:491, 3:492-493, 3:495--496 
gradients 2:8—9, 2:66, 2:463, 2:477-478 
medial modeling 1:486, 1:487, 1:489-490 
mesh adaptivity 1:514—-515 
moving boundaries 1:519 
plates 1;203-205 
porous medium 2:584—586 
shell theory 2:76-77, 2:81 
thin-walled structures 2:65, 2:76-77, 2:81 
degenerating solid elements 2:79-83 
degradation, geomechanics 2:553 
degrees of freedom (DOF) 
material responses 2:416-417 
mixed finite element methods 1:238, 1:258-261, 1:263-266 
Stokes equations 1:263—266 
thermal diffusion 1:258-261 
delamination 
buckling 2:368 
cohesive-zone models 2:363 
composite laminates 2:453 
partition-of-unity concept 2:367-368 
Delaunay methods 1:498, 1:501~502, 1:508 


covariance 1:211-212 
diffuse 1:289-290 
elliptic partial differential equations 1:14 
fractional time 2:752—753 
Frechét 1:182 
Lie 2:243 
local time 1:418 
material 1:418-419, 3:129 
normal derivative kernels 1:605 
pseudospectral 1:142 
shape functions 1:283~291 
see also time... 
derived statistics 2:670-671, 2:675-676 
design 
aerodynamic shape optimization 3:379-381, 3:389-390 
costs 3:389-390 
stochastic finite elements 2:677, 2:678, 2:679-680 
destructuring, geomechanics 2:554—555 
detachment points 1:536 
detection 1:317~321, 1:535, 2:196, 2:214-216 
deterministic eddies 3:321 
deterministic geotechnical analysis 2:544-545 
development environments, visualization systems 1:543, 1:544-545 
deviatoric stresses 3:593-594 
DEVSS see discrete elastic-viscous stress splitting 
DFM see deformation fields method 
DG see discontinuous Galerkin 
DGCL see discrete geometric conservation laws 
DIA see direct interaction approximation 
differencing 
aerodynamics 3:340-342 
boundary values 1:25-26 
convection—diffusion equations 1:44~45 
curved boundary approximations 1:13 
elliptic partial differential equations 1:12-13 
equations 3:340-342 
five-point 1:9, 1:50 
meshless 1:45-47, 1:290, 1:291 
Murman-Cole 3:339-340, 3:341-343 
nine-point approximations 1:13-14, 1:50 
potential flow equation 3:339-340, 3:342~344 
transonic small-disturbance equation 3:340-342 
two-point boundary problems 1:10-11 
vectors 2:80, 2:81 
wave equation 1:30~34 
see also central differencing; finite difference 
differential. .. 
equations 
Bessel’s 2:704 
conservation 1:419 
mixed finite element methods 1:238 
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differential... (continued) 
parabolic 1:675-702, 3:341-342 
see also ordinary. ..; partial... 
geometry 1:493-494, 2:63-64 
operators 1:238, 3:30-32 
quadrature methods 2:184 
differentiation. . . 
continuation methods 1:664-665 
error 2:741, 3:282 
shape perturbations 2:740-741 
diffuse derivatives 1:289-290 
diffuse element method (DEM) 1:280, 1:289-290 
diffusion 
advective-diffusion equation 3:25-26 
conservation laws 1:466-468 
dominated convection—diffusion-reactions 3:196 
equation 1:18-28, 1:696—700, 3:25-26 
flames 3:500-501, 3:502 
incompressible viscous flows 3:170—-171 
reaction equations 1:696-699, 1:700 
soil consolidation 2:562-~563 
streamline 1:449~-450, 3:97, 3:187, 3:188-189 
thermal 1:238-240, 1:252-254, 1:257-262, 3:503 
turbulence closure 3:304, 3:309, 3:3129--320 
viscous flows 3:137-139 
see also convection—diffusion 
dilation 1:289, 2:557, 2:558 
dilute family methods 2:412 
dimensional reduction 2:70—107 
direct constraint elimination 2:203 
direct differentiation method (DDM) 2:667, 2:741—742 
direct interaction approximation (DIA) 3:51 
direct methods 
acoustics 2:713 
buckling stability analysis 2:157 
eddy viscosity 3:51 
hierarchical matrices 1:614—615 
linear algebraic solvers 1:553-560 
nonlinear parabolic equations 1:27 
panel clustering 1:599 
shell theory 2:76-79, 2:83 
direct numerical simulations (DNS) 
adaptive 3:184—185, 3:191-197 
applications 3:293-294 
computability 3:184-—185, 3:191-197 
eddy viscosity 3:49, 3:51-52 
forced homogeneous turbulence 3:236 


large eddy simulations (DNS/LES) 3:184—185, 3:191-198 


Navier-Stokes equations, turbulence 3:218-219 
numerical error 3:281-282 

reacting flow combustion 3:506 

renormalized scales 3:261 

resolution requirements 3:279-281 

shallow water equations 3:247 

time advancement schemes 3:283 


turbulence 3:2, 3:236, 3:269-270, 3:279-283, 3:293-294 


turbulent flames 3:514—515, 3:518 
direct piezoelectric effect 2:761 
director. . . 
. field interpolation 2:112 
spaces 1:207 
vectors 2:118 
Dirichlet boundary conditions 
continuous blending meshfree methods 1:305-306 
error estimates 2:36~37 


far field scales 3:9 
shallow water equations 3:245-246, 3:248-249, 3:250, 3:251 
thin-walled structures 2:67 
variational multiscale method 3:11-18 
Dirichlet principle 1:348 
Dirichlet problems 
boundary integral equations 1:707, 1:714-715 
hierarchical error estimators 2:52 
implicit residual error estimators 2:30-31 
panel clustering 1:599 
preconditioners 1:638 
Dirichlet-to-Neumann (DtN) boundary conditions 
acoustics 2:696, 2:703—708, 2:711 
multiscale methods 3:8-11 
stabilized methods 3:8-11 
discontinuity capturing directional dissipation (DCDD) 3:550-551 
discontinuity failure modeling 2:337 
discontinuous. . . 
deformation analysis (DDA) 1:326—329, 1:330—331, 1:332—333 
finite elements 3:91~123 
Galerkin (DG) methods 3:1-2 
advective-diffusive equations 3:36-37 
complex viscoelastic fluid flows 3:491, 3:495 
compressible flows 3:74-76, 3:77-81 
computational fluid dynamics 3:91-123 
conservation laws 3:91, 3:96-115 
convection 3:91, 3:115, 3:120-123 
diffusion 3:120-123 
enhancement 3:93~94 
hp-finite methods 3:62, 3:71-75, 3:77--81 
linear hyperbolic equations 3:91, 3:92-96 
Maxwell equations 1:735 
neutron transport equation 3:91, 3:92--96 
of order zero 1:677, 1:683-684 
Runge—Kutta 3:97, 3:104-117 
second-order ellipticity 3:91, 3115-120 
viscoelastic fluid flows 3:483, 3:484, 3:495 
see also Petrov-Galerkin 
interpolation 1:264-266, 3:484 
meshfree methods 1:300-303 
modeling 1:311-335, 2:337, 2:369 
Pressure interpolation 1:264-266 
discontinuum, damage models 2:369 
discrete... 
bounded extensions 1:638 
contact-friction 2:473-476 
crack models 2:362 
curve discretization 1:505 
elastic shakedown 2:301-304 
elastic-viscous stress splitting (DEVSS) 3:483-487, 3:495 
element methods (DEM) 1:3-4, 1:311-335 
arbitrary shape fracture mechanics 2:390-394 
basic framework 1:314-316 
block deformability modeling 1:324—329, 1:334-335 
boundary conditions 1:321-324 
contact constraint impositions 1:321-324 
contact detection 1:317-321 
discontinuous deformations 1:326-329 
energy balance contact 1:331-333 
fracturing 1:329-331, 2:390-394 
fragmentation 1:329-331 
geomechanics 2:545-546, 2:548 
interacting bodies 1:317-321 
nonsmooth contact conditions 1:314-316 
temporal discretizations 1:331—-333 
time integration 1:331-333 
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estimates 1:82 
failure modeling 2:335, 2:362-370 
finite volume methods 1:443-—444 
function representation 1:318, 1:320-321 
geometric conservation laws (DGCL) 1:425, 3:466-468 
interfaces 1:424 
iterative solutions 1:376 
Kirchhoff elements 2:110 
maximum principle 1:10-11, 1:445-446 
Morse theory 1:493 
moving least squares 1:286-287 
Navier-Stokes equations 3:219-228 
optimality conditions 2:301 
Poincaré inequality 1:82 
projection smoothers 3:179 
Sobolev inequality 1:82 
test filters 3:279 
unit meshing surfaces 1:506 
discretization 1:3-4 
boundary integral equation method 1:715 
collocation boundary element method 2:723-724 
contact mechanics 2:206-214 
continuous blending meshfree methods 1:304 
curved segment 1:505 
discrete element methods 1:331-333 
dynamic equations 2:492—493 
dynamic multilevel methods 3:231-240 
enhanced 3:560-561, 3:566-567, 3:568-570 
error control 2:37~39, 2:40-46 
error estimation 
convection—diffusion equations 1:38-40, 1:46-47, 1:49-50 
coupled methods 2:40-46 
elliptic partial differential equations 1:15-17 
model adaptivity 2:44—46 
a posteriori 2:40-46 
-scheme 1:22--23 
wave equation 1:32-33 
finite difference 1:3, 3:360-361 
finite volume 1:465—466, 3:360-361 
fluid dynamics 3:560—561, 3:566-567 
fluid flows 3:359-365 
Galerkin 1:600, 1:606, 1:607, 1:612, 3:170 
General Galerkin 2:700-701, 3:184—185, 3:187~189, 3:191—-198 
geomechamics 2:544-545 
Helmholtz equation 2:696, 2:698-702 
homogenization 2:281~282, 2:286~287 
-interface-capturing technique 3:560-561 
material responses 2:416—417 
mesh refinement 3:560-561 
multigrid methods 1:588—-589 
Newton’s method 1:658 
nonlinear aeroelasticity 3:462-464 
panel clustering 1:599-600 
partial differential equations 1:291~300 
plates 1:201, 1:220-221 
shells 1:201, 1:220-221 
stochastic finite elements 2:663--667 
thin-walled structures 2:112-113 
turbulence closure 3:319 
unit meshing 1:504—505 
up-dating 3:568-570 
see also finite element. ..; spatial...; time 
dispersion 
error 2:696, 2:700, 2:701-702 
numbers 1:33-34 
time integration 2:179 


displacement 


boundary conditions 2:311, 2:442-443, 2:447, 2:449 
boundary integral equations 1:345, 2:721-722 
composite laminates 2:442~443, 2:447, 21449 
compressed foam-rubber tubes 2:53-54 
discontinuity 2:732-734 
fields 
concrete mechanics 2:525-526 
fracture mechanics 2:399-400 
hydroelastic-sloshing 2:689—691 
layerwise laminate theory 2:435--436 
structural-acoustics 2:684—687 
gradients 1:244-245, 2:343—-344 
integral representation 2:720-721 
magnitude 2:85, 2:87-91 
mapping 1:533-534, 2:247, 2:490-491 
parameterization 2:116-120 
plates 1:200-207, 1:209-210, 1:215, 1:229 
potential 2:685-686, 2:690 
representative volume element 2:442—443 
return-mapping 2:247 
shakedown 2:311 
shells 1:200-201, 1:215 
statically determinate structures 2:662—663 
symmetric gradients 1:244-245 
weighting functions 2:172-173 
dissipation 2:465 
contact mechanics 2:206 
crystal plasticity 2:272-273 
eddy viscosity 3:52-53 
forced homogeneous turbulence 3:237—239 
function regularization 2:320 
heterogeneous microstructures 2:282-283 
hyperelastic 2:465 
inequality 2:465 
time integration 2:179, 2:186~-187, 3:52-53 
turbulence closure 3:304, 3:308, 3:318, 3:319 
wave equations 1:707 
distortion-free refinement 2:38 
disturbed state concept (DSC) 2554-555 
divergence-free Fourier-Legendre polynomials 3:215-217 
DML see dynamic multilevel methods 
DNA molecules 1:517, 1:519 
DNS see direct numerical simulations 
DNS/LES see direct numerical simulation/large eddy simulations 
DOF see degrees of freedom 
Doi-Edwards model 3:492—493, 3:495 
domain. . . 
acoustics 2:702-703, 2:712-713 
adaptive wavelets 1:163—165 
arbitrary Lagrangian—Bulerian method 1:414, 1:416, 1:417-418 
boundary element method (D/BEM) 2:735-740 
boundary integral equations 1:704, 1:713-714 
bounded 1:510-516, 3:10-11 
complex multidimensional 3:359-365 
compressible flows 3:76~-77 
curved 1:109—-111 
decomposition (DD) 1:6 
acoustics 2:712-713 
adaptive wavelets 1:163-165 
finite elements 1:618~619 
historical overviews 1:619~621 
incomplete LU factorization 1:567 
material responses 2:425 
nonoverlapping 1:91, 1:618-620, 1:633-644, 2:712 


758 Subject Index Subject Index 759 
domain... (continued) dumbbells 3:492~494 eigenfrequencies 1:224-229, 2:766, 2:767 elastodynamics 
overlapping 1:617—618, 1:619—-620, 1:630-633 Dunford—Cauchy representation 1:615 eigenstrain concepts 2:412 dual reciprocity BEM 2:758-760 
preconditioning 1:617-644 DYNAS3D 2:212 eigenvalues nonsingular BEM 2:762-768 
of dependence 1:29-30 dynamic. .. composite laminates 2:456-458 space-time boundary integrals 1:708 
incompressible flows 3:69-70, 3:157—158 adaptivity 1:170-174 iteration 1:571-574 structural dynamics 2:170-171 
integrals 2:726 arbitrary Lagrangian—Eulerian mechanics 1:428 Koiter shell theory 1:217-218 elastomeric bead compression 2:480-481 
Lipschitz 1:167 contact 2:220-221 linear algebraic solvers 1:551-575 elastoplasticity 
overlapping 1:617-618, 1:619-620, 1:630-633, 2:575-599 equilibria stability 1:672-673 Maxwell equations 1:726-727 contact mechanics 2:205, 2:219 
p-finite element method 1:131 isothermal solutions 2:589~592 multigrid methods 1:593 cylindrical billet upsetting 2:479 
Plates 1:199—200 meshes 3:464-466 Orthogonal Iteration Method 1:569 damaging solids 2:481—483 
preconditioners 1:566-567 multilevel methods (DML) 3:207-264 Power Method 1:567—568, 1:569 deformations 2:227-264, 2:267, 2:270-271 
shakedown 2:296-297, 2:298-299 piezoelectricity 2:758-759, 2:761-762, 2:763 QR method 1:567, 1:568-571 augmented Lagrangian 2:260~-263 
shells 1:199-200 shakedown 2:310 Rayleigh quotients 1:567-568 closest-point-projection 2:251-255, 2:258-259, 2:262-263 
thin 1:199-229 subgrid-scale modeling 3:288—290 Signorini-type interfaces 1:398-399 infinitesimal models 2:232-239 f 
time 1:703—719, 2:713, 2:751—758 substructuring 2:686~-687 tensor fields 1:537 integration 2:244—250 
viscoelasticity 2:754, 2:755-756 eigenvectors 1:537, 1:552-553 return-mapping 2:244--250, 2:275—276 
viscous flows 3:157-158 EINST 3:566—567 dynamic contact 2:220 
dominant transport 3:164~166 nem epee elastic. .. geomechanics 2:552—553, 2:555—556, 2:566-567 
double cantilever beams 2:367-368 LST 3: au arterial walls 2:606-607 p-finite element method 1:133-135 
Doyle Ericksen formula 221 pci at Seca aa axisymmetric soil layers 2:581-582 symmetric Galerkin BEM 2:727-729, 21738740, 2:744 
drag EARSM see explicit algebraic Reynolds stress model bars 1:128—130 tangents 2:235-237 
aerodynamic shape optimization 3:380 eddies 3:259-263 a 3 ` Bene Mea 5 
x M: ody deformations 2:7~9 elastoviscoplasticity 2:483—484 
buff bodies 3:189--191 eddy viscosity damage 2:344—345, 2:457 electio 
coefficients 3:101~103, 3:122-123, 3:146~147, 3:174-175 electromagnetics 3:50-52 def ore 2:244—250. 2:275-276 ductors 1:726 
correction factor 3:486—487 multiscale method 3:40-41, 3:43-44 science ad a eee 
a domain 2:273 fieids 1:709, 1:724 
dissipation 3:195—196 Second Moment Closure 3:315 dumbbells 3:492 fluxes 1:724 
laminar flow around cylinders 3:202-203 shallow water equations 3:241 ses 5 Bes i aSa 1: 
square cylinders 3:191—192, 3:193-194, 3:197-198 Smagorinsky 3:40-41, 3:43-44 energy 1:200, 1:209-210 porntals Lg 
surface-mounted cubes 3:192-195, 3:196, 3:197-198 subgrid-scale modeling 31285-287 nads DIET : elpetrodynanotcs 170902 
viscoelastic fluid flows 3:495-496 turbulence closure 3:304—306, 3:308, 3:310-311, 3:315, ideally plastic materials 2295-296 lectrogalvamzed strel/sheets 26/4476 
dragbrace fittings 1:134-135 3:318-322 linear damage 2:344—345 electromagnetism 3:50-52, 3:451 
drained shearing 2:557, 2:558 variational multiscale method 3:47—49 ee poesia ee 
parte inb os lerncuis ee eis Wench yR us st Pi a <n essentially local extremum diminishing 
illi å a i116- predictors 2:467, 2: element... 
phage mena aes m sk auu 1:95 property upscaling 2:515—-516, 2:521-522 by element preconditioners 1:566 
drop-by-position preconditioners 1:565 crossover 3:70 a ae 3297 anre a 
by-size preconditioners 1:5 le ity 1: lefinition 2: e Galerian 
piei Erea ns sisialta pa Eas. 1:123 discrete models 2:301-304 arbitrary crack growth 2:384—387 
Drunker—Prager criterion 2:535-536 saturation 1:508 extremum principles 2:299-301 finite element methods 1:304, 1:306 
dry friction 2:205 singularities 3:158 kinematics 2:294—302, 2:305, 2:318-319 meshfree methods 1:280, 1:286-289, 1:291, 1:304, 1:306 
Dryja’s preconditioner 1:635—636 tracked interface locator technique (ETILT) 3:562-564 restrnined blocks 2:322-323 partial differential equations 1:291 
DSC see disturbed state concept EDICT see enhanced-discretization interface-capturing technique tangent moduli tensor 2:14 Green’s function 3:21-25, 3:26, 3:29-30 
DSD/SST see deforming-spatial-domain stabilized spacetime EDMRT see enhanced-discretization mesh refinement technique trial om 2:245 cea HaT 578) 754 nee ae Sagi 1:80-82 
DIN see Dirichlet-to-Neumann EDSTT 3:561, 3:566 viscoelastic correspondence principle 2:751, 2:753— clet numbers J: 
Du Fort-Frankel-Saul’ev finite differencing 1:51 EDSUM see enhanced-discretization successive up-date methods re stress splitting (EVSS) 3:483-487, 3:489- 490 ~~ ag 
dual formulations EEME see explicit elliptic momenmm equation elasticity technology 2:476— 
adaptive computation 1:693-694 effective... p p “a based damaged models 2:336--338 elker condition 1:252-—254, 1:255, 1:257, 1:258-259 
adaptive wavelets 1:167-168 bulk modulus 2:419 boundary integral equations 2:735-740 ellipsoid tensor fields 1:537-538 
augmented Lagrangian 2:261-262, 2:263 indices 3:203, 3:205 buckling 2:142—144, 2:145-149, 2:150-164 elliptic. . . 
boundary Risa 2:758—-763 material response properties 2:407—427 buried arch structures 2:559-560 boundary integral equations 1:704, 1:705, 1:711 
closest-point-projection equations 2:254-259, 2:262-263 media theory 2:514, 2:515-529 collocation 2:723-727 boundary values 
contact mechanics 2:218 shear modulus 2:418, 2:419 coupled BEM/FEM 1:376-377, 1:403-408 coupled BEM/FEM 1:375-408 
elastic shakedown 2:300 stiffness matrices 1:332 fast solution techniques 2:729-731 error estimates 1:95--97, 1:109-111 
error representation 1:97, 2:26-27, 2:48 stress principle 2:561 finite element methods 2:5-7, 2:24-46 finite element methods 1:95-97, 1:107-111 
goal-oriented error estimators 1:97 tensors 2:449-451, 2:454, 2:455 geomechanics 2:552 finite volume schemes 1:465—466 
norms 1:86 efficiency ligament mechanics 2:627-629 Galerkin boundary elements methods 1:347, 1:358-366 
plane linear elasticity 1:377 adaptive computation 1:691, 3:197-198 linear theory 2:7-40 h-finite elements 1:73 
reciprocity 1:704, 2:758-763 airbrake 3:432-433 meshfree methods 1:301-302 numerical integration 1:107—109 
Signorini-type interfaces 1:398-403 explicit residual error estimates 1:88—89 mixed finite element methods 1:241-246, 1:268~269 Ritz—Galerkin methods 1:74-77 
time-stepping 3:376-377, 3:378, 3:379 finite element control 1:86 nonlinear theory 2:7-16 condition 1:249-254 
see also coupled... vanishing 3:408-410 shakedown 2:294—304, 2:305, 2:318-319, 2:322~-323 conservation laws 1:175, 1:181-187 
ductile. .. velocity evaluations 3:141~145 shape sensitivity 2:740-743 continuum damage models 2:342-344, 2:355-362 
failure 2:535-536 EFG see element free Galerkin structural mechanics 2:40-42 damage mechanics 2:342~344 
fractures 2:349, 2:350 eigen-modes 1:200—201, 1:206-207, 1:224-227 symmetric Galerkin BEM 2:727--729 difference equations 3:341 
single crystals 2:267 eigen-pair expansions 1:206-207 tensors 2:414-416, 2:449-451, 2:455-456, 2:522-524 hyperbolic problems 1:3 


Subject Index 761 


760 Subject Index 
ID 


i elliptic. . . (continued) P Š 
! mesh generation 1:421 m dynamics 3:560-561, 3:566--567 ; goal-orientated 1:97--98, 2:26-27, 2:33~37, 2:48-50 European crew transfer vehicles 3:434—435 
parabolic problems 1:3 int San technique (EDICT) 3:560—561 a h-finite element spaces 1:68-70 European transonic wind-tunnel (ETW) 3:431~-433 
partial differential equations 1:12-18 mesh refinement technique (EDMRT) 3:560-561 4 h-symmetric BEM/FEM 1:382-389 evolution equations 1:158, 1:169-175, 2:14 
projections 1:692-693 earl up-date method (EDSUM) 3:568-570 p implicit 1:89-91, 1:96 EVSS see elastic viscous stress splitting 
regularity 1:76-77 ee , a incompressible viscous flows 3:170-173 exact... 
relaxation 3:317, 3:318, 3:321 R ele oe aca A interpolation 1:80-82, 1:512, 1:61-68 blending 1:125-126 
shelis 1:212-214 Š 7 n cema nonoscillatory : E linearized elasticity 2:5-7, 2:24—-40, 2:42-44 formulations 2:76-79, 2:83 
i space-time boundary integrals 1:705, 1:711 sont be y 3:210, 3:211, 3:239-240 £ Maxwell equations 1:735 Maxwell equations 1:727--732 
variational probl 1:623—6; P mesh adaptation 1:511-516 otential flow equation 3:342—-345 
embedded matt ie a conservation laws 1:441 —442, 1:447—448 moving badalas meshing 1:519 i sai 3:16-17 
embedded discontinuities 2:351-355 ee ee neutron transport equation 3:94, 3:96 excavations 2:566-567 
embedded multiscale hierarchies 2:423~424 ee does balance equations 2:12-13 scale separation 3:230—231, 3:234 existence conditions 1:650-651, 2:640 
energetic ordering 2:415—416 at PY" ea ain pairs 1:441 symmetric advective-diffusive equations 3:39, 3:40 exothermic reactions 1:668-669 
energy 16k aa SAAL =AAZ, LAAT Eade, failure analysis 2:487-488 experimental techniques, concrete mechanics 2:516, 2:517-520 
arbitrary Lagrangian-Eulerian equations 1:419-420, 1:426 ioc T 1:447 TEE Helmholtz equations 2:696, 2:699-700, 2:701-702 explicit... 
balance 1:331-333 á niea indicators algebraic Reynolds stress model (EARSM) 3:421 -423, 3:430 
bending 1:209, 1:215 a energy 3:174—175 elliptic momentum equation (REME) 3:487, 3:489—490 
boundary 2:744 gael aeoiee 39-7 forming processes modeling 2:485-488 error estimates 1:87-91, 1:96, 2:28-29 
buckling 2:140-141, 2:143, 2:147, 2:162 es incompressible viscous flows 3:174~175 finite difference method 2:546, 2:547-548 
cascade 3:285 , he ree a see Schur complement 1:384—-387 integration 2:220 my ene 
channel flows 3:212 bl eree i PE inf~condition 1:274—276 solution methods 2:491—495, 2:546, 2:547-548 
complementary Dirichlet energy principle 2:24 fluid dynamics ERTE tos interpolation 1:80-82, 1:5 12, 1:61-68 strong stability preserving time integration 1:465 
complementary energy funetions 2:251 of motion 1:316, 2:161-162 oe. time integration 1:51, 2183-184, 2:188-189, 3:252-254, 
ion 1: 2419 È T A ON 3 z 
H consisten 2:445 449 EER of state 3:417-418, 3:439, 3:506-507 numerical 3:281 -283 š se ates 252 
| Pata Cy ee j see also individual entries attitionine 2:424--425 time stepping 1:683-686, 1:687, 3:366-368 
aee ia integration 2:186--187 equilibration estimators 1:91 Pi ae EI 1:603 total variational diminishing schemes 1:451 
ssipation 2230-231, 3:308 equilibri peragit È lee explosions 1:426, 1:427 
elasticity balan i 4 is reduction property 1:91-93, 1:103 p 
y balance equations 2:11-12, 2:13 approximations 3:314-3 P IT 247-2 exponential. . . 
elastoplastic deformations 2:230-231, 2:2 PP! 15 representation 2:25-27, 2:47-48, 3:190-191 
ian 43174198 :230-231, 2:234 boundary layers 3:291-292 roundoff 1:656-657 7 7 convergence 3:110 
fete E E E critical states 2:139, 2:144~149, 2:156-159, 2:161-164 tolerance 2:158 decay 1:19-20 
fea Pra Taar tae ae elastic shakedown 2:305 truncation 1:32~33, 1:41, 1:48, 3:139-140 mapping 2:248--250, 2:467-468 
aces equations 2:11, 2:611 Oe nce gee return-mapping 2:248-250 
free 2:271-272, 2:273, 2:456, 2:465 sate waveform advection 3:62-63 ded finite el thod (XFEM) 2:399-400 
functions 2:201 stability 1 ce see also a posteriori. , .; a priori... .; averaging. . .; discretization. . .; Faa z nite element method ( ) 2:399- 
harvesting 3:72 : residual. . . cats 
Hill's pear 28409. 2411-412 states 2:139-149, 2:156-159, 2:161-164 i Eshelby formalism 2:412 arterial walls 2:609—613, 2:614 
earns Sa T equivalence concepts 3:509—510 ps ESL see equivalent single-layer bounded 1:637-638 
kinergy 2:162 Š equivalent plastic strain rate 2:506 essential boundary conditions 1:293-298, 1:299, 1:305-306 exterior. .- 
membrane 1:209, 1:215 equivalent single-layer (ESL) models 2:436—437 essentially local extremum diminishing (ELED) schemes 3:353 acoustics 2702-703 
Navier-Stokes equations 3:210 error... essentially nonoscillatory (ENO) schemes 1:453-455. 3:354-356 boundary values 1:734—735 
norms TO a: 2613:97 ETILT see edge-tracked interface locator technique Dirichlet boundary conditions 3:9 
error estimates 1:85, 1:86-97, 2:25-26 7 FEE A 5 ETW see European transonic wind-tunnel displacement 1:345 
forming processes modeling 2:486-487 tay a tia EUGENIE 3:413--414, 3:439, 3:449 generalized Green’s formula 1:354, 1:357 
generalized 2:486-487 IEE EE Euler traction 1:345—346 
shallow water equations 3:256-257, 3:258--259 of approximation 1:188 aerodynamic computations 3:429, 3:432, 3:433 external loads 1:202-205 
preserving time integration 2:185 a ka elements methods 1:339-371 angles 2:119-120 extinction methods 1:538-541, 2:396-397 
inci ae oo sounds 3:143—144 ; : traction 1:538-539 
Principle of Minimum Complementary Pi i ` aa 7 codes 3:413-416, 3:439, 3:449 extraction 2s, 
reacting flow equations 3:505 3:514. TE OA Por nave canons iga continuum mechanics 1:413-414 extreme eigenvalues 1:398-399 
telease rates 2:337, 2:456-458 2:531~-532, 2:734 contact mechanics 2:219 equations extremum diminishing schemes 1:445, 3:348, 3349-351, 3:363 
shallow water equations 3:241—242, 3:256257, 3:258-259 control 4:86—87, 3:170-175 acoustic fields 2:697 extremum principles 2:299-301 
shear 1:209 i be serena discretization 2:5-7, 2:37-39, 2:40-46 compressible 3:101, 3:103-105, 3:383-386 extrusion 2:500-502 
i space-periodic flows 3:211 el todynamics 2:766, 2:767 computational aerodynamics 3:327, 3:329 
j spaces 1:350, 1:353, 1:119 element nodal interpolation operators 1:80-82 be gas dynamics 3:101, 3:103~104, 3:105, 3:113-117 
transfer 3:51 cotmegon k shock capturing 3:348-359 F-bar-patch methods 2:478-479, 2:500-502 
E F-differentiable mapping 1:650, 1:656 


advective-diffusive equations 3:35-36 Euler-Lagrange equations 1:267 


turbulence closure 3:30 A i 
l viicoplastic eee TETA anisotropic 1:81-82, 1:63-66 General Galerkin discretization 3:188-189 F/A-18 aircraft 3:468-469 
see also kinetic; potential; strain one messed foam-rubber tubes 2:53-54 homogenous function theorem 2:238 F7X business jets 3:427, 3:431-433, 3:442-443 
; engineering artifact modeling 1:475-494 conser vation laws 1-445450, 41467468 : implicit time-stepping schemes 3:369 FA see fully adjusted 
Engquist—Osher schemes 3:98, 3:99 energy norms 1:85, 1:86-97, 2:25-26 k motion 1:415-416 face modes 1:123 
f enhanced: á explicit 1:87-91, 1:96, 2:28-29 i operators 1:481—482, 1:483 facet flip 1:509-510 
A assumed strain 2:93 finit elasticity 2:52 Ee parameters 2:119-120 factors of safety 2:305, 2:551, 2:568 
continuum models 2:355-362 leat lant 186-8718998 26 197-98 i predictors 1:666 failure 
discontinuous Galerkin methods 3:93-94 i ig” Ag | solvers 3:411-412 composite laminates 2:431~458 
discretization fundamental 1:448 fi strain tensors 1:136 error indicators 2:487—488 
goal functionals 1:85, 1:97--98 | visualization algorithms 1:534 loads 2:139 
f 
k 
k 
; 
$ 
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failure (continued) 
modeling 2:335~370 
cohesive-zone models 2:337, 2:349-355, 2:363~365 
discrete 2:335, 2:362-370 
Falcon... 
50 aircraft 3:411, 3:427, 3:433~434 
900 aircraft 3:453 
business jets 3:427, 3:431-433, 3:442-443 
F-16 Block-40 fighter aircraft 3:474-477 
F7X business jets 3:427, 3:431-433, 3:442-443 
falling cubes in water 3:604—607 
falling spheres 3:486—487, 3:571-573 
far-field... 
boundary conditions 3:362 
expansions 1:601, 1:604—605 
partitioning 1:601—602 
scales 3:8-9 
Faraday’s law 1:723-724 
fast... 
multiplication 1:600-601, 1:602—604, 1:607, 1:611 
multipole method (FMM) 2:730-731, 3:141-144 
solvers 1:389-394, 2:729~731 
fatigue 2:311, 2:380, 2:206 
Favre averaged values 3:522~523 
FDM see finite difference methods 
FEAM see finite element alternating method 
FEM see finite element method; finite element methods 
FETI see finite element tearing and interconnecting 
fiber-reinforced composite laminates 2:431-458 
classical laminate plate theory 2:433-434 
continuum damage mechanics 2:440-441, 2:452~-458 
damage 2:431—458 
failures 2:431-458 
layerwise laminate theory 2:435-437 
literature reviews 2:437-440 
macroscale 2:440—451 
microscale 2:440—451 
multiscale modeling 2:432-433, 2:440-451 
partition-of-unity 2:367-368 
plates 1:207, 1:211, 2:433-435 
progressive damage modeling 2:451-458 
representative unit cells 2:443-446 
representative volume element 2:432, 2:441-443, 2:445-451 
shear deformations 2:434-435 
FIC see finite calculus 
field. .. 
elimination 2:580 
equations 2:200, 2:696-697, 3:462 
vortices 1:536 
FieldView application 1:546 
fill-ins 4:565 
filter functions 3:42 
filtered Navier-Stokes equation 3:42-43 
filtered velocity 3:41 
filtering 
large eddy simulations 3:272-274 
length scales 3:278-279 
material responses 2:412 
meshes 3:277-279 
operators 3:272—274 
spectral methods 1:150 
fine grids 3:233 
fine scales 
diffusivity 3:25-26 
Green’s function 3:18-—20 
variational multiscale method 3:11-27 


fine staggered meshes 3:223-225 
Finger tensors 3:492 
finite calculus (FIC) 3:581, 3:583-587 
finite deformations 
composite laminates 2:441-442, 2:447-448 
contact mechanics 2:196 
elastoplasticity 2:239-244, 2:248-250 
finite difference methods (FDM) 1:3, 1:7-52, 3:360—361 
aerodynamics 3:409—411 
backward 1:36~37, 1:50, 1:51 
convection—diffusion equations 1:43-44 
Crank-Nicolson 1:51 
diffusion equation 1:18-28 
discretization 1:3, 3:360-361 
Du Fort-Frankel-Saul’ev 1:51 
elliptic partial differential equations 1:12-18 
five-point approximations 1:50 
forward approximations 1:50, 1:51 
fourth-order hyperbolic equations 1:34-36 
geomechanics 2:545-546, 2:547~548, 2:562-563 
heat conduction 1:18-28 
hyperbolic equations 1:28-36 
meshfree 1:290 
Navier-Stokes equations 3:213~214, 3:220, 3:228 
parabolic equations 1:18-28 
shallow water equations 3:245—246, 3:248-249, 3:250, 3:251 
soil consolidation 2:562-563 
two-point boundary problems 1:9-12 
variable sensitivity 2:667 
wave equation 1:28-34 
finite dimensions 1:246-257, 2:17, 2:18-20 
finite elasticity 
contact mechanics 2:201 
error estimates 2:46-54 
finite element methods 2:5-7, 2:16-24, 2:46—-54 
ligament mechanics 2:627—629 
nonlinear boundary values 2:16-24 
finite elements 
advective-diffusive equations 3:34, 3:39~40 
alternating method (FEAM) 2:381 
anisotropic 2:20-21, 1:57, 1:63-66 
arbitrary Lagrangian—Eulerian methods 1:414, 1:421, 1:431 
arterial walls 2:616-618 
boundary elements symmetric coupling 1:377-389, 1:403-405 
composite laminates 2:431—458 
concept 1:77-79 
conforming 1:73, 1:77-85 
discontinuous 3:91-123 
discontinuous Galerkin methods 3;91—123 
discrete element coupling 1:325—326, 1:334 --335 
discretization 
aerodynamics 3:360-361 
buckling 2:140, 2:163 
concrete mechanics 2:534-535 
diffusion 3:170-171 
error estimation 2:5~7, 2:40-46 
error-controlled 2:5-7, 2:37-39, 2:40-46 
forming processes modeling 2:470—471 
Helmholtz equation 2:699 
hyperelasticity 2:16-24 
Navier-Stokes equation 3:160--166, 3:173—174 
penetration detection 2:215-216 
ship hydrodynamics 3:587--589 
spatial 3:160-166 
stochastic 2:663-667 


thin domains 1:220--221 
thin-walled structures 2:108-113 
elastic shakedown 2:303-304 
embedded discontinuities 2:351-355 
equations 1:586-589 
error estimates 1:104—105 
forming processes 2:469~471 
geomechanics 2:549-551 
geotechnical engineering 2:568--569 
h-version 1:56-58 
heart wall mechanics 2:622—624 
knee joints 2:628-629 
mesh representation 1:546~547 
meshfree methods 1:293, 1:303-306 
methods (FEM) 1:3, 1:4, 1:73-114, 3:3 
adaptive 1:98, 1:103, 1:510-516, 2:386~390, 2:485~-491 
blending meshfree methods 1:293, 1:303-306 
blood flow 3:537-538 
boundary element coupling 1:375-409 
elastoplasticity 2:744-745 
symmetric coupling 1:377-389, 1:403—405 
buried arch structures 2:559-561 
coarse scales 3:15-18 
complex geometric configurations 3:346 
curved domains 1:109-111 
Dirichlet-to-Neumann boundary condition 2:707 
discretization 2:5—7, 2:40—46 
error estimation 1:82—98, 2:5-7, 2:24-54 
finite elasticity 2:5-7, 2:16-24 
fluid dynamics 3;545-574 
fracture mechanics 2:378-381 
generalized 2:399~400 
geomechanics 2:545-547, 2:559-561, 2:564-565 
Hellinger—Reissner 2:21-23 
hierarchical matrices 1:610, 1:612—613, 1:615 
Hu-Washizu 2:23-24 
hybrid 2:21-24 
interior estimates 1:111-112 
kinematic nongeometric fracture mechanics 2:399-400 
linearized elasticity 2:5—7, 2716-40 
local mesh refinement 1:98-104 
MATLAB 1:98, 1:113-114 
Maxwell equations 1:6, 1:723-736 
mesh refinement 1:98-104 
model adaptivity 2:5-7, 2:24—40, 2:44-46 
panel clustering 1:597, 1:607, 1:610, 1:612-613, 1:615 
partial differential equations 1:292-294, 1:295 
pollution effects 1:111-112 
Ritz—Galerkin methods 1:74~77 
soil consolidation 2:564-565 
superconvergence 1:112-113 
thin domains 1:219-229 
three-field 2:23 
time 2:176, 2:178, 2:184, 2:186 
unbounded domains 2:702 
see also coupled. ..; hp-version. . .; mixed. ..; p-version; 
stochastic. . .; 
Navier-Stokes equations 3:583~-584 
nodal interpolation error estimates 1:61~63 
non-Newtonian flow 3:490—491 
nonconforming 1:105-107, 1:588—589 
nonmoving meshes 3:565-566 
numerical integration 1:107--109 
parabolic differential equations 1:676, 1:689—690 
plates 1:207, 2:59, 2:79-83 
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shakedown analysis, tubes 2:325~-326 
shape functions 2:365-369 
shell theory 2:59, 2:79-83 
space-time 2:171-173, 3:27-28 
spaces 
construction 1:73, 1:77-82 
discrete estimates 1:82 
discretization 3:160-161 
element nodal interpolation 1:80—82 
interpolation error estimates 1:80-82 
multigrid methods 1:588 
partitions 1:79-80 
properties 1:77~82 
triangulations 1:79-80 
tearing and interconnecting (FETI) method 1:618-619, 1:641-643, 
2:696, 2:712 
thin-walled structures 2:59, 2:104-128 
time-discontinuous space-time equations 2:171—173 
turbulence 3:228 
vascular solid mechanics 2:616-618 
viscoelastic fluid flows 3:484-485, 3:490-491 
see also coupled. ..; h-version 
finite hyperelasticity 2:16-24, 2:46-54 
finite layer solutions 2:563-564 
finite plasticity 2:267-287 
finite point methods (FPD) 1:291 
finite spheres 1:280, 1:291-292 
finite strains 
contact discretizations 2:210-214 
crystal plasticity 2:268 
domain boundary element method 2:740 
elasticity 1:431, 2:242, 2:464-467 
exponential retum-mapping 2:248-250 
forming processes modeling 2:477—-478, 2:481-484 
low-order elements 2:477-478 
viscoplastic deformations 2:242 
finite volume elements (FVE) 1:466 
finite volume (FV) methods 1:4, 1:439-470 
conservation laws 1:439—450, 1:464-470 
discontinuous Galerkin methods 3:97 
discretization 3:360-361 
elliptic boundary values 1:465—466 
Euler code 3:413-414 
flow field compression 1:169-170 
higher-order accuracy 1:450-464 
higher-order time integration 1:464—465 
first... 
law of thermodynamics 2:12 
ply failure (FPF) 2:437-438 
type hexahedral elements 1:728-729 
type tetrahedral elements 1:729-731 
first-order. . . 
hyperbolic equations 1:28 
monotone schemes 3:419-—420 
ordinary differential equations 3:31-32 
reliability method (FORM) 2:678 
saddle points 1:183 
shear deformations 2:86, 2:105-106, 2:434—435 
five-parameter shell models 
degenerating solid elements 2:80 
direct approach 2:77 
shell theory 2:84, 2:87, 2:91-92, 2:94-95 
five-point differencing 1:9, 1:50 
fixed support stencil reconstructions 1:462 
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flames 
acoustic interactions 3:503-504 
flamelet regime 3:517-518 
speeds 3:511-512 
stretch 3:512-513 
flat plate bending 2:84~-102 
flexible boundaries 1;323-324 
flexural cylindrical shells 1:224, 1:226-228, 1:229 
flexural displacements 1:202 
flexural shells 1:215 
flight tests 3:475-476, 3:429, 3:430 
flo6 3:346 
flo22 3:327, 3:328 
floating wood 3:603, 3:604 
flow 
air 2:584-586 
bifurcations 1:672-673 
cavity fiows 3:209, 3:223-225 
channel 3:209-210, 3:212-213, 3:221-223, 3:227—228 
creep 2:526 
cross 1:425-426, 3:487-488 
deforming porous medium 2:584—-586 
field compression 1:169-175 
free shear 3:426 
Geostrophic 3:253~254 
Hermès space shuttle 3:433-434 
inviscid flows 3:131-137 
Jz 2:228, 2:239, 2:246-248, 1:134, 1:138 
mean 3:302-303, 3:304, 3:318, 3:319, 3:321 
non-Newtonian 3:481—496 
nondivergent 3:309 
past cylinders 3:122~123, 3:201-205, 3:487—488, 3:494-496 
periodic 3:219-220, 3:221, 3:235--240, 3:377—379 
plasma 3:81-85 
plastic 2:274-275 
potential flow methods 3:334—348, 3:580 
Prandtl—Batchelor 3:133 
Quasi-Geostrophic 3:253-254 
reacting fiow control 3:499-523 
separation 3:320 
shear 3:294 
shock-free transonic 3:346 
simulations 3:570-574 
soil consolidation 2:564~-565 
solid boundaries 3:145-—149 
spatially-periodic 3:210~-212, 3:555-557, 3:573 
steady 3:358—359, 3:481, 3:482~488 
subsonic 3:101, 3:102-104, 3:334-337, 3:425-426 
supersonic 3:101, 3:102—104, 3:425~426 
theory 2:228, 2:239, 2:246-247 
three-dimensional 3:78-81, 3:134~137 
two-dimensional 3:77-78, 3:130-134 
unbounded inviscid flows 3:131-137 
viscous 3:130, 3:137-139, 3:330-331, 3:585—587 
wall bounded 3:426 
see also blood...; compressible. . .; computational. ..; fluid. ..; 
incompressible. ..; transonic. ..; viscoelasticity. . . 
fluctuation fields 2:281 
fluid... 
dynamics 
arbitrary Lagrangian—Eulerian 1:421, 1:422-426 
meshfree methods 1:280 
moving boundaries 3:3, 3:545-574 
turbulence closure 3:301-322 


flow 3:1-3 
aerodynamics 3:330-~334, 3:335 
complex multidimensional domains 3:359--365 
discretization 3:359-365 
equations 3:587-588 
viscoelasticity 3:3, 3:481-496 
fluid-mesh/structure time integrators 3:471-473 
flux-driven test cases 2:583-584 
object interactions 
falling spheres 3:571-573 
spatially periodic flows 3:555~557, 3:573 
subcomputation technique (FOIST) 3:558-559 
pressure loadings 2:684—-685, 2:689 
rigid-body interactions 1:424, 1:425-426 
ship interactions 3:589-590 
structure interactions 2:683~-692 
arbitrary Lagrangian—Eulerian 1:421, 1:423-424, 1:426, 1:427 
hydroelastic-sloshing 2:683—-684, 2:689--692 
nonlinear aeroelasticity 3:460—477 
structural-acoustics 2:683, 2:684—-689 
vibrations 2:683 
flutter 
aerodynamic shape optimization 3:379 
computational flow mechanics 3:448-449, 3:450 
dual time-stepping 3:376-377, 3:378, 3:379 
nonlinear aeroelasticity 3:459-460, 3:464, 3:469, 3:477 
fluxes 
corrected transport 3:329 
difference splitting 3:356 
functions 1:169-170 
Godunov finite volume discretizations 1:444—445 
splitting 3:329, 3:356 
thermal diffusion 1:261-262 
vector splitting 1:469—470, 3:329 
flying conditions 3:379 
FMM see fast multipole method 
FOIST 3:558~559 
Fokker—Plank equation 3:492 
foldpoints 1:669-672 
footing tests 2:557-558, 2:559 
force. . . 
coefficients 3:373 
displacement law 1:316 
driven tests 2:583 
forced vibration 2:759 
forebody control 3:436 
forging 2:496-502 
FORM see first-order reliability method 
forming processes modeling 2:461~507 
adaptive finite element methods 2:485-491 
bulk forming operations 2:496—502 
contact-friction 2:471-476 
continuum constitutive modeling 2:462-469 
element technology 2:476-481 
explicit solutions 2:491-495 
forging 2:496-502 
friction modeling 2:471-476 
implicit finite element solutions 2:469-471 
inelastic constitutive models 2:481-484 
meshes 2:488—489 
metal cutting operations 2:502-506 
strip stretching 2:491 
thermomechanical coupling 2:484—485 
thin sheets 2:495~496 
transfer operators 2:489-491 


i 
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Fortin’s trick 1:270~272 
forward finite differencing 1:50 
forward-backward MacCormack finite differencing 1:51 
forward-time, central space finite differencing 1:51 
foundation and error analysis 1:339-371 
four-point bending 2:39—40 
Fourier... 
Bessel series 2:563~-564 
coefficients 3:243-244 
expansions 1:34-35 
Galerkin approximation 3:214-215 
Legendre polynomials 3:215-217 
representations 3:377-379 
series 1:716, 2:563-564 
spectral methods 1:141~142 
transforms 
boundary integral equations 1:704, 1:711-712, 1:713-714 
pressure 3:6-7 
soil consolidation 2:563-564 
space-time boundary integrals 1:711—712 
time-harmortic waves 2:698 
fourth-order equations 1:34-36, 1:85 
FPD see finite point methods 
FPF see first ply failure 
fractal boundaries 1:654 
fractional time derivatives 2:752-753 
fractional-step methods 3:218-219, 3:588-589 
fracture 2:375~402 
boundary element method 2:732-735 
concrete mechanics 2:532-533 
discontinuous deformations 1:330-331 
discrete crack models 2:362, 2:363 
discrete element methods 1:329-331 
energy 2:350-351 
see also computational fracture mechanics 
fragmentation 1:329-331 
Frechét-derivative 1:182 
Frechét-differentiable mapping 1:650, 1:656 
Fredholm integral equations 1:180, 1:594 
Fredholm theorem 1:10, 1:352, 1:354 
free... 
charge density 1:724 
energy 
continuum damage mechanics 2:456 
crystal plasticity 2:271-272, 2:273 
elasticity balance equations 2:13 
potentials 2:465 
fixed rods 2:757 
shear flow 3:426 
stream sensitivity 3:309 
surfaces 
arbitrary Lagrangian—Eulerian dynamics 1:423, 1:424 
boundary conditions 3:580, 3:581-582, 3:586-587, 3:589 
flow 3:574 
liquids 2:689 
wave equation 3:586-587, 3:589 
vibration analysis 2:759 
Frelat’s method 2:319-320 
frequency 
domains 1:704, 1:713-714 
eigenfrequencies 1:224-229, 2:766, 2:767 
errors 2:179 
multifrequency methods 2:710-711 
zero-frequency mode 2:686, 2:691 


friction 
arbitrary Lagrangian~Eulerian 1:431, 1:433 
cohesive-frictional soils 2:550-551 
contact 
constraints 1:315, 1:328 
friction 2:471-476 
mechanics 2:201~202, 2:204-206, 2:219—220 
Coulomb 1:315, 1:328, 1:433, 2:205-206 
dry 2:205 
dynamic contact 2:220 
forming processes 2:471-476 
frictionless contact 2:200-201 
skin 3:307, 3:315, 3:321-322 
slip corrector 2:473 
tangential slip 2:206 
Tresca 1:433 
velocity 3:307 
Frobenius hierarchical matrices 1:614 
Froude number 3:580 
frozen active sets 2:276-278 
fully... 
adjusted (FA) states 2:554 
discrete finite volume methods 1:443-444 
populated matrices 1:597-615 
saturated soils 2:589-592 
functional subgrid-scale modeling 3:284, 3:285-287, 3:290 
fundamental aerodynamic studies 3:436-437 
fundamental error estimates 1:448 
fundamental solutions 
boundary integral equations 1:707—709, 1:716-717 
displacement integrals 2:720—721 
elastodynamics 2:766 
space-time boundary integrals 1:707—709 
fusion 3:115, 3:117 
FV see finite volume 
FVE see finite volume elements 


G-NI see Galerkin with Numerical Integration 
Galerkin 
boundary element methods 1:347, 1:358-366 
Aubin—Nitsche lemma 1:364~365 
Céa’s lemma 1:358-362 
convergence optimality 1:362—364 
elliptic boundary values 1:347, 1:358-366 
ill-posedness 1:365—366 
panel clustering 1:600 
stability 1:365-366 
boundary values 1:24-25 
Bubnov—Galerkin weak form 1:292-293 
discretization 1:600, 1:606-607, 1:612, 3:170 
general 2700-701, 3:184-185, 3:187—189, 3:191-198 
equations 1:358-362 
finite element methods 2:699, 3:583 
finite volume methods 3:413-414 
Fourier—Galerkin approximation 3:214—215 
least-squares (GLS) 
advective-diffusive equations 3:34-35, 3:36, 3:37, 3:39, 3:40 
Helmholtz equations 2:700-701 
Hermès space shuttle 3:433-434 
Navier—Stokes code 3:417 
stabilized methods 3:22-23 
methods 
adaptive wavelets 1:177-180 
boundary integral equations 1:346 
conforming 1:177 
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Galerkin (continued) 
coupled BEM/FEM 1:376, 1:379-382, 1:385 
Dirichlet-to-Neumann boundary condition 2:707 
Euler code 3:413-414 
Helmholtz equations 2:698-699, 2:700-701 
Navier-Stokes equations 3:214—217 
space-time boundary integrals 1:709-712 
subgrid-scale models 3:30-32 
time discretization 3:170 
time-discontinuous 2:173, 2:176-177, 2:184 
see also discontinuous. ..; 
with Numerical Integration (G-NI) 1:143-144, 1:147, 1:149 
orthogonality 2:25, 2:47 
projections 1:207, 1:210 
Ritz~Galerkin 1:73, 1:74-77, 1:556, 1:557-558 
Taylor~Galerkin 3:583 
wavelet schemes 1;178-180 
weak forms 1:291-—293 
weights 1:361 
see also element free. ..; Petrov-Galerkin; Runge—Kutta. ..; 
symmetric Galerkin. . . 
Galilean invariance 3:275 
gamma finite difference method 1:521:52 
Garding inequalities 1:351-352, 1:369-370, 1:379 
gas dynamics/motion 1:426-427, 3:101, 3:103-105, 3:113-117 
Gauss law 1:724 
Gauss—Lobatto integration 1:143-144 
Gauss—Lobatto points 1:121 
Gauss-Seidel implicit time-stepping 3:370-372 
Gauss-Seidel smoothers 3:178-179 
Gaussian. . . 
distribution 2:659-660 
elimination 1:563-564 
filters 3:273-274 
window functions 1:281 
General Galerkin discretization 2:700~701, 3:184-185, 3:187-189, 
3:191-198 
General Quasi-Linear Model 3:313 
generalized... 
energy norms 2:486-487 
explicit total variational diminishing schemes 1:451 
finite difference 1:290 
finite element method (GFEM) 2:399—400 
Galerkin method 2:700-701, 3:184—185, 3:187-189, 3:191-198 
Green’s formula 1:354—355, 1:357 
Hermite polynomials 2:674-676 
implicit total variational diminishing schemes 1:451~452 
minimal residuals (GMRES) 1:558~559, 2:702, 3:565-566, 3:570 
real Schur form 1:570 
representation formula 1:355-358 
self-consistent method 2:412 
serial staggered (GSS) procedure 3:473 
slope limiters 3:107-108 
geomaterials 2:551-558, 2:559 
geomechanics 2:543-569 
consolidation 2:543, 2:561—567 
constitutive models 2:551-558, 2:559 
deterministic analysis 2:544-545 
limit analysis 2:549-551 
numerical analysis methods 2:545-549 
soil-structure 2:558-561 
stochastic techniques 2:567~-569 
geometric. .. 
conservation laws 1:424-425, 3:466-468 
consistent integration 2:274-275 


continuum constitutive forming process modeling 2:463--464 
deformations 1:519 
error estimation 1:519 
exact formulations 2:76-79, 2:83 
extraction 1:538-539 
modeling 1:5 
aerodynamic shape optimization 3:438-439 
attributes 1:490-492 
blood flow 3:538~540 
boundary representation 1:477—485, 1:487-489 
complex shapes 1:475-494 
constructive solid geometry 1:483-485 
engineering artifacts 1:475-494 
medial modeling 1:485-490 
surface patches 1:477-481 
system architecture 1:475-476 
turbulent flame combustion 3:519, 3:520 
voxel representation 1:2~3, 1:10-11, 1:476-477 
nonlinearity 2:310-—311, 1:135-136 
representation 2;378-394 
shell theory 2:76-79, 2:83, 2:93-95 
three-dimensional shells 2:83-102 
visualization 1:526 
geophysics 3:251-259 
Geostrophic flows 3:253-254 
Germano identity 3:274~275, 3:288-289 
GFEM see generalized finite element method 
glass-fiber-reinforced polypropylene 2:361-362 
global... 
contact mechanics 2:197, 2:198, 2:216—220 
continuity projection-based interpolation 1:733 
interpolation error estimates 1:68—70 
a posteriori error estimates 2:27~33 
searches 2:196, 2:214-216 
GLS see Galerkin least-squares 
glyphing 1:530, 1:533, 1:538, 1:540-541 
GMRES see generalized minimal residuals 
goal functionals 1:85, 1:97~-98 
goal-orientated error estimates 1:97-98, 2:26-27, 2:33-37, 2:48-50 
Godunov... 
finite volume discretizations 1:442-445 
fluxes 3:99 
like convection phase stress-updates 1:430-431 
gradients 
aerodynamic shape optimization 3:440~442 
based optimization methods 2:644 
conjugate iterations 1:642-643, 1:147 
convective projections 3:586 
damage mechanics 2:355~362 
definition 3:388 
deformation 2:8-9, 2:66, 2:463, 2:477-478 
discontinuities 1:302-303 
displacement 1:244-245, 2:343-344 
enhanced damage models 2:356-359, 2:369 
iterations 1:185, 1:642-643, 1:147 
mesh deformations 3:441-442 
modified deformations 2:477~478 
objective functions 2:644 
operator 1:727-732 
plasticity-damage models 2:359-360 
recovery estimators 1:93-95, 1:96-97 
reduced formulation 3:386-387 
smooth particle hydrodynamics 1:284~285 
Sobolev 3:388, 3:389 
graft-artery bypass junctions 3:536, 3:537 


Gram matrices 1:287 
granular media 1:316, 1:323-324 
graphics 1:527-528 
gravitational potentials 1:8 
gravity 2:689-692, 3:247-248, 3:251 
green water 3:581 
Green~Gauss linear reconstructions 1:461-462 
Green-Lagrangian strain tensors 2:66-67, 2:73-74, 2:91, 2:441-442 
Green~—Naghdi stress rate 2:494—495 
Green’s formula 1:354-355, 1:357 
Green’s function 
coarse scales 3:14—15 
convection—diffusion equations 1:44~45 
difference schemes 1:44—45 
Dirichlet boundary conditions 3:9-10 
element 3:21-25, 3:26, 3:29-30 
fine scales 3:18-20 
space-time formulations 3:29-30 
variational multiscale method 3:18 
vortex methods 3:130, 3:141 
grids 
computational 1:547-548 
filters 3:272~274 
fine 3:233 
hierarchies 1:580 
particle-grid method 3:144-145 
sub-grid stress 3:32-33 
two-grid iterations 1:581-584 
see also coarse. ,.; multigrid methods; subgrid-scale models 
ground effects 3:432, 3:433 
ground transportation systems (GTS) 3:148, 3:149 
growth 2:615—-616, 2:622 
see also crack.., 
GSS see generalized serial staggered 
GTS see ground transportation systems 
GUFbased frameworks 1:544 
Gurson model 2:341 


h-adaptivity 1:511-516, 3:101-—102 
A-convergence 1:126-131 
h-finite elements 1:3, 1:56-58 
elliptic boundary values 1:73 
nonoverlapping domain decomposition 1:634-644 
spaces 
Deny-Lions lemma 1:60—61 
global interpolation error estimates 1:68-70 
interpolation 1:55-70 
interpolation operators 1:58-60 
local error estimates 1:61-68 
nodal interpolation 1:58—59, 1:61-66 
quasi interpolation 1:59-60, 1:66-68 
symmetric coupled boundary elements 1:382--389 
h-p clouds 1;291-292 
Haar wavelets 1:160-162, 1:481 
half-edge data structures 1:481, 1:483 
hand fitting parameter identification 2:641-642 
hanging nodes 2:38 
Hankel functions 2:704-—706 
Hankel transforms 2:563-564 
hardening 
concrete mechanics 2:531, 2:536 
crystal plasticity 2:273-274 
curves 2:475 
geomechanics 2:552—553, 2:555--556 
kinematic 2:308-309, 2:553 
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plasticity 2:553 
shakedown 2:307-310 
hardware operation levels 1:476 
harmonic Ritz values 1:553 
Harten—Lax—van Leer flux 1:470 
Harten’s explicit total variational diminishing schemes 1:451 
Hashin~Shtrikman bounds 2:412-413, 2:418, 2:420 
hat-functions 1:395 
haunched continuous plates 2:44—46 
Hausdorff topologies 1:487 
head 1:520-521, 2:565 
headlamp panels 2:496, 2:497 
heart valves 2:624-625 
heart wall mechanics 2:618-625 
heat... 
conduction 1:18—28, 1:622 
equations 
adaptive computation 1:676, 11689-692, 1:696-699, 1:714-719 
error estimates 1:690-691 
logistics reaction-diffusion 1:697-698, 1:699 
space-time boundary integrals 1:706, 1:710-712, 1:713 
space-time Galerkin finite elements 1:676, 1:689-690 
strong stability estimates 1:691—692 
time-stepping 1:696—-697, 1:714-719 
flux vectors 2:12 
potential operators 1:718~719 
telease 3:499 
transfer 
Second Moment Closure, turbulence 3:315 
thermo-hydromechanics 2:592-599 
turbulence closure 3:320, 3:321 
vortex methods 3:149 
Heaviside-type traction 2:762, 2:763 
heavy liquids 2:688—689 
hedgehogs 1:533 
helicity 3:130—131 
helicopters 3:104, 3:105, 3:376-379, 3:571, 3:572 
Hellinger—Reissner functional 
dual-mixed finite element method 2:21-23 
elasticity 1:242—243, 1:244, 1:256, 1:268-269 
thermal diffusion 1:239-240, 1:257-262 
Helmholtz equation 
accelerated multifrequency methods 2:710-711 
acoustics 2:697-702 
boundary integral equations 1:705, 1:716 
Dirichlet-to-Neumann formulation 3:8-9 
discretization 2:696, 2:698-702 
element Green’s function 3:26 
space-time boundary integrals 1:705 
time-harmonic waves 2:697-698 
wavenumbers 2:699~-701, 3:26 
Helmholtz free energy 2:13 
Helmholtz operator 3:18 
hemodynamic conditions 3:537—538 
Hencky elasticity 1:376-377, 1:403-405 
Hencky strain energy function 2:466-467 
Hencky strain tensors 2:9 
Hencky-von Mises stress-strain relation 1:377 
Hencky-von Mises type materials 1:376—377, 1:403-405 
Hermès space shuttle 3:412, 3:433-434 
Hermite elements 1:77-78 
Hermitian matrices 1:552 
Hermitian polynomials 2:213-~214, 2:671-676 
Hertz—Signorini—Moreau conditions 2:201 
heterogeneity 2:281~-283, 2:286-287, 2:432-433 
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hexahedral. . . 
elements 1:728-729, 2:38, 2:214 
shape functions 1:123-124 
to tetrahedral transformations 3:65 
hidden variables, Maxwell equations 1:727 
hierarchical. . . 
error estimators 1:93, 2:31-32, 2:35-37, 2:52 
interpretations 2:421-425 
matrices 1:597, 1:607-615 
models 
adaptive 2:425 
elasticity 2:40-42 
mesh layout 1:223-224 
plates 1:201~202, 1:207-211, 1:223-224 
shells 1:201-202, 1:218-219, 2:106-107 
structural mechanics 2:40-42 
p-refinement 3:18-20 
polynomial expansions 3:65-67 
shape functions 1:733~734, 1:120—124 
two-level decompositions 1:385-386 
high... 
angle of attack 3:460, 3:429-431, 3:436 
cycle fatigue 2:311 
level operation levels 1:476 
lift effects 3:432, 3:433 
resolution switched schemes 3:351 
speed machining 2:502-504, 2:505 
speed trains 3:5-7, 3:570-571 
temperature hypersonic flows 3:433—434 
High Inadiance RESponse (HIRES) 1:685, 1:686 
higher fundamental solutions 1:716-717 
higher-dimensional manifolds 1:666—667 
higher-dimensional parameter spaces 1:671~672 
higher-order. .. 
accurate finite volume methods 1:450-464 
continuum models 2:355-362 
elements 2:112-113 
elliptic partial differential equations 1:13-15 
explicit time-stepping schemes 3:367-368 
gradient models 2:355-362 
hierarchical models 1:210-211 
meshes 1:530 
models 1:210~211, 2:106—-107, 2:355-362 
nonoscillatory shock capturing 3:353-354 
polynomial approximations 2:700 
predictors 2:155 
Stokes elements 3:164 
time integration 11464-465 
Hilbert spaces 1:74, 1:349-350, 1:363 
Hill’s condition 2:409, 2:411-412 
HIRES see High Irradiance RESponse 
historical overviews 
aerodynamics 3:325—329, 3:408-413 
computational mechanics 1:1-2 
domain decomposition 1:619-621 
mesh generation 1:498-499 
plate theories 2:60 
thin-walled structures 2:59—60, 2:71-72 
hodograph transformation 3:381 
Héider inequality 1:67 
Holder spaces 1:342-343, 1:356 
homogeneous. .. 
elastic-plastic deformations 2:270-271 
functions, Euler theorem of 2:238 
material properties 2:85 


plastic deformation 2:269-271 
reactors 3:513-514 
turbulence 
isotropic 3:209-240 
Navier-Stokes equations 3:228-240 
spatial/time behaviors 3:227 
two-level decomposition 3:228-—232 
homogenization 
axial stress—axial strain curves 2:342 
composite laminates 2:432—433, 2:439—440 
concrete mechanics 2:515~516, 2:522-524 
crystal plasticity 2:269—-271, 2:281—282, 2:286-287 
elastic-plastic crystals 2:281-282, 2:286-287 
micromechanics 2:407—427 
multiscale modeling 2:407-427 
homotopies 1:489-490, 1:661-662, 1:665 
Hood—Taylor elements 1:263—264, 1:266 
Hooke materials 2:14 
Hookean dumbbells 3:492—-495 
Hooke’s laws 1:207, 1:208, 2:15-16, 2:582 
hoop strains 2:87-89 
Hopf bifurcations 1:672—673 
horizontal motion, ships 3:602, 3:603 
hot jets 3:445-446 
hp convergence method 1:126-131 
hp-adaptivity 1:104, 1:723-736, 3:91, 3:101-102 
hp-clouds 1:291-292 
hp-finite element methods 3:1 
coarsening strategies 1:104 
coupled boundary element 1:389-394 
polynomial expansions on unstructured grids 3:63-67 
spectral elements 3:61—88 
Hsieh-Clough—Tocher macro elements 1:78 
Hu~Washizu finite element method 2:23-24 
Hu-Washizu functional 1:240, 1:243--244, 1:245-246, 1:256 
hull waters 3:581 
hulls 3:581, 3:589-590, 3:594-599 
hybrid methods 
constitutive equations 2:644-645 
convection—diffusion equations 1:47-50 
displacement boundary elements 2:763-766 
Eulerian/Lagrangian solvers 3:149 
finite element 2:21-24 
incomplete LU factorization 1:565-566 
Lainser tunnel 2:527-529 
Schwarz theory 1:626, 1:627 
Stress finite element 2:24 
tunnel linings 2:514, 2:516-529 
hydration 2:526 
hydrocodes 1:422 
hydrodynamics, ships 3:579—607 
hydroelastic-sloshing 2:683-684, 2:689-692 
hydrostatic boundaries 1:323-324 
hygral property upscaling 2:516 
hyperbolic. .. 
advective-diffusive equations 3:39 
boundary integral equations 1:705, 1:706, 1:711, 1:712, 1:713 
conservation laws 1:466-470, 1:169-175 
difference equations 3:341 
finite difference equations 1:28-36 
introductory survey 1:3 
space-time boundary integrals 1:706, 1:711, 1:712, 1:713 
time-stepping methods 1:715 
hyperboloid flares 3:427—428 
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hyperelasticity 
constitutive relation 2:13-14 
contact mechanics 2:196-197, 2:200-201, 2:202 
elastic constitutive model 2:465—467 
geomechanics 2:552 
hyperelastoplasticity 1:429-430 
hypersingular. . . 
boundary integral operators 1:349 
kernel integral equations 1:594 
operators 1:376, 1:176, 1:177 
hypersonic flows 3:433~434 
hyperstreamlines 1:538, 1:539 
hyperviscosity 3:241 
hypoelasticity 2:552 
hypoelastoplasticity 1:428~430 
hypoplasticity 2:553 


ICF code 3:115, 3:117 
idealization, geomechanics 2:544-545 
identification experiments 2:517~520 
identifying material parameters 2:637~654 
ignition times 3:511, 3:513 
ill-posedness 1:365—366, 2:640-—641 
image analysis 2:517-520 
image-order volume rendering 1:542 
impacts 1:280, 2:379 
impedance boundary conditions 1:726 
imperfect bonding 2:439 
imperfection sensitivity 2:149-150, 2:349 
implicit. .. 
constitutive integration 2:736-740 
error estimators 1:89-91, 1:96 
finite element solution 2:469—471 
functions 1:538-539 
residual error estimators 1:387~389, 2:29-31, 2:34-35, 2:50-52 
solution methods 2:546 
temporal discretizations 3:488-490 
thickness integration 2:83 
time integration 2:182~183, 2:188-189, 3:252 
time-stepping schemes 3:368-372 
total variational diminishing schemes 1:451—-452 
implosion simulation 3:115, 3:117 
impressed surface currents 1:726 
in-plane material models 2:535-536 
in-plane strains 2:92 
incomplete Cholesky decompositions 1:562, 1:565 
incomplete LU factorization 1:562—567 
incompressible. . . 
elasticity 1:244—-245, 1:254-256 
flows 3:2 
computability/adaptivity 3:186-187 
convection—diffusion 3:12]--122 
energy-harvesting 3:72 
inertialess 3:482-488 
isothermal viscoelastic fluid 3:482 
mean flow 3:304 
ship hydrodynamics 3:581-583 
spatial discretization 3:69 
stabilization parameters 3:549—-550 
Taylor vortex 3:71--72 
time integration 3:67-69 
viscosity 3:2, 3:155-179 
error control 3:170—175 
mathematical models 3:156-160 


space discretization 3:160-166 
time discretization 3:166~170 
vortex methods 3:131-137 
wake 3:72-74 
see also turbulence. .. 
hydroelastic-sloshing 2:683-684, 2:689-692 
meshfree methods 1:298—300 
Navier-Stokes equations 
computubility/adaptivity 3:186—187 
finite element methods 3:160--166, 3:548-549 
multiscale method 3:40-55 
space-time formulation 3:44—45 
spatial discretization 3:160—-166 
time dependent domains 3:69-70 
incremental... 
boundary values 2:469-470 
collapse 2:304—307, 2:323, 2:325 
constitutive law 2:469 
crack growth simulation 2:377 
stability 2:280 
variational formulation 2:280 
indefinite finite elements 1:104~105 
indices 
laminar flow around cylinders 3:203, 3:205 
orientation 1:607-608 
reliability 2:677 
Sobolev index 1:366--371 
indirect methods 2:157 
industry 
aerodynamics 3:407-454 
p-finite element method 1:134—135, 1:136-137 
inelastic. . . 
constitutive equations 2:638—640 
cracked bodies 2:311 
distortions 2:267 
materials 2:481--484, 2:638-640 
inertial... 
confinement fusion simulation 3:115, 3:117 
equations 2:173~174 
inertialess incompressible flows 3:482—488 
manifolds 3:211-212 
range of scales 3:211-212 
waves 3:251 
inexact... 
Additive Schwarz Method 1:623-625 
multiplicative Schwarz methods 1:625-626 
Newton’s methods 1:660-661 
subdomain solvers 1:636-638 
inextensional displacement p 1:215 
inf—condition 
mixed finite element methods 1:248-252, 1:254, 1:257-259, 
1:266~276 
proof techniques 1:269-276 
saddle-point stability 1:248-249, 1:251-252, 1:254, 1:257 
Stokes equations 1:266-—267 
thermal diffusion 1:258-259 
viscoelastic fluid flows 3:484 
see also Babuka~Brezzi condition 
infinite dimensional problems 1:159, 1:185-186 
infinite elements 2:708-710, 2:711-712 
infinitesimal deformations 2:232~239, 2:245-248, 2:442-443, 
2:448—449 
infinitesimal elastoplasticity 2:232~239, 2:245-248 
inflation 2:609—613, 2:614 
inflow boundary conditions 3:291, 3:292-293 
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information visualization 1:526 
infrared (IR) signatures 3:451-452 
Inglebert’s method 2:319-320 
initial value problems (IVPs) 1:676—702, 2:268-269 
inlet designs 3:428-429, 3:433, 3:434 
integrals 
Cauchy 1:343, 1:344-345, 1:718-719 
conservation equations 1:419-420 
conservation law 1:442 
continuous boundary operators 1:351 
differentiation 2:741 
longitudinal scale 3:211 
multigrid methods 1:593, 1:594-595 
nonlinear multigrid iterations 1:593 
panel clustering 1:599 
strain interior points 2:735-736 
stress interior points 2:735—736 
time derivatives 1:418-419 
transforms 2:564 
volume 1:418—419 
see also boundary integral equations 
integration 
cells 1:293 
contact-friction model 2:473 
degenerating solid elements 2:83 
elastoplasticity 2:244-250, 2:736-740 
explicit 1:465, 2:220 
Gauss—Lobatto 1:143-144 
plastic flow 2:274-275 
semi-implicit time 3:245, 3:246, 3:252, 3:254-256 
semi-Lagrangian time 3:69 
stress 2:467-468, 2:493--495 
through-the-thickness 2:94 
viscoelastic fluid flow 3:490—491, 3:492-493 
viscoplastic deformations 2:244~250 
visualization algorithms 1:534-535 
see also numerical. ..; time... 
integrity tensors 2:454—456 
Intelligent light FieldView application 1:546 
inter penalty (IP) method 3:118~119 
interacting body characterization 1:317-321 
interfaces 
boundary conditions 2:697 
capturing 3:560-562 
computational systems 1:546-548 
elements 2:363-365 
fluid dynamics 3:3 
Signorini-type 1:376~-377, 1:396-403 
tracking 3:561-562 
visualization systems 1:546—548 s 
interior... 
displacement 1:344 
estimates 1:111~112 
points 2:735-736 
traction 1:344 
interlaminar failure 2:439 
intermediate mortar surfaces 2:210 
internal... . 
energy 2:11-12 
length scales 2:361-362 
modes 1:122-123 
variable dependency 2:310 
variable mapping 2:489-490 
work 2:446-449 


interpolation 


arbitrary Lagrangian—Eulerian 1:421-422 


errors 1:80-82, 1:512, 1:61-68 
estimates 1:80-82, 3:35, 1:61-68 
h-finite element spaces 1:55-70 


interpolated sub-time stepping technique 3:561 


meshes 1:421-422, 1:516 
mismatches 1:547 
nodal 1:80-82, 1:58-59, 1:61-66 
operators 1:80—82, 1:84, 1:58—60 
projections 1:732-734 
Stokes equations 1:264-266 
viscoelastic fluid flows 3:484 
see also approximation 
intersections 1:302, 2:114-117, 1:133 
intrinsic material functions 2:526-527 
invariance 
affine 1:657-658 
Galilean 3:275 
Lax~Wendroff scheme 3:415 
Navier-Stokes equations 3:275 
positive streamwise 3:415, 3:420-421 
reflection 3:275 
rotational 1:733-734, 3:275 
time shift 3:275 
inverse... 
constitutive equations 2:640-641 
hierarchical matrices 1:611-612, 1:615 
Laplace transforms 1:707 
power method 2:159 
stiffness matrices 1:611-612 
inviscid, .. 
equations 3:331-332 
flows 3:131~137 
flux terms 3:82-84 
pitching cycles 3:380 
IP see imter penalty 
IR see infrared 
irregular boundary domains 3:157-158 
irregular data sets 1:529 
isocontours 1:532 
isolas 2:160 
isoparametric. .. 
contact discretizations 2:207 
elements 1:731-732 
finite element method 1:109 
interpolations 2:209, 2:724 
mapping 1:58 
surface elements 2:212—213 
isothermal solutions 2:589-592 
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bound meshes 1:513 
compressible materials 2:14-16 
compression 2:555-556 
damage mechanics 2:336-338 
elasticity 2:205-206, 2:726 
finite elements 1:61-63 

finite hyperelasticity 2:16-24 
finite strain 2:242, 2:248-250 
function representation 2:244 
linear elasticity 2:205-206 
material properties 2:85 
Maxwell equations 1:723—736 
plasticity 2:243-244, 2:249 
simplices 1:57 


turbulence 
closure 3:312, 3:313 
direct numerical simulations 3:294 
homogeneity 3;209-240 
large eddy simulations 3:295-296 
Navier-Stokes equations 3:209-240 
yield criterion 2:235 
Isotropization of Production (JP) 3:313 
iteration 
adaptive wavelets 41:185-187 
arbitrary Lagrangian—Eulerian mechanics 1:431 
contact mechanics 2:197—198, 2:216 
coupled BEM/FEM 1:376 
defect correction 3:176 
direct linear algebraic solvers 1:555-556 
Dirichlet-to-Neumann boundary condition 2:707~708 
eigenvalues 1:571-574 
fluid dynamics 3:565—566 
Iterated Standard Partitioned Procedure 2:583 
Maxwell equations 1:735 
multigrid methods 1:580-581, 1:590-592 
nonlinear parabolic equations 1:26 
nonlinear systems 1:650-661 
substructuring 1:633-635 
IVPs see initial value problems 


Jq-flow theory 2:228, 2:239, 2:246-248, 1:134, 1:138 
Jackson estimate 1:164, 1:167 
Jacobi. . . 
Davidson method 1:574 
implicit time-stepping 3:370 
iteration 1:581-583 
matrices 1:593, 3:331-332 
orthogonal polynomials 1;143-146 
preconditioner 1:390—391 
Jacobians 2:200 
Jameson—Schmidt—Turke! (JST) scheme 3:329, 3:351, 3:353, 3:357, 
3:363 
Jaumann stress rate 2:493—494 
jets 3:398~399, 3:427, 3:431-433, 3:436—437, 3:442-448 
Johnson—Mercier elements 1:268, 1:269 
Jordon matrix forms 1:553 
JST see Jameson-Schinidt~Turkel 
jumps 
boundary elements 1:351 
buckling 2:160, 2:161--164 
discontinuous Galerkin methods 3:92-96, 3:103 
operators 2:172 
residuals 1:87-88 
space-time boundary integrals 1:706 


k-e model 3:303-307, 3:320, 3:321 
k-1 model 3:306-—307 
k-w model 3:308-310, 3:320, 3:321 
Karhunen—Loeve expansion 2:665—667, 2:671-676 
kernels 

ellipticity 1:252-254 

expansions 1:606-607 

explicit constructions 1:717-718 

far-field expansions 1:605 

meshfree arbitrary shape approaches 2:383—384 
kinematics 

admissible displacements 1:209-210 

arbitrary Lagrangian—Eulerian method 1:416-417 

arterial walls 2:610-611 
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classical laminate plate theory 2:433-434 
constitutive equations 2:650 
constraint 2:339-340 
elasticity 2:7-10, 2:294-302, 2:305, 2:318-319 
enhancement 2:453 
equations 2:91-93 
finite elements 2:303—304 
forming process modeling 2:463—464 
hardening 2:308-309, 2:553 
layer discretization 2:112-113 
nongeometric representations 2:394, 2:399-400 
shakedown 2:294-302, 2:305, 2:318-319 
shear deformations 2:434—435 
shells 1:219 
space enrichnients 1:209—210 
kinergy energy 2:162 
Kinetic energy 
aeroacoustics 3:446-—447 
decay inequality 3:45, 3:48-49 
elasticity balance equations 2:11-—12 
homogeneous turbulence 3:237—240 
shallow water equations 3:241-242, 3:254, 3:255~259 
spectral amplitude 3:43~44, 3:47 
turbulence closure 3:303 
kinetics 
classical laminate plate theory 2:433-434 
reacting flow combustion 3:507 
response 2:161 
Kirchhoff displacements 1:203-204, 1:229 
Kirchhoff stress 
contact mechanics 2:200 
copper single crystals 2:283-284 
Green—Naghdi stress rate 2:494-495 
Jaumann rate 2:493-494 
tensors 2:10, 2:14, 2:230-231 
Kirchhoff—Love models 
classical laminate plate theory 2:433-434 
discretization 2:109—110 
plates 1:204, 1:208-209, 1:229, 2:433-435 
shear deformations 2:434—435 
shells 1:211-218, 2:103-105 
Kitware ParaView application 1:546 
knee joints 2:628-629 
known-solution methods 2;381-382, 2:383 
Koiter, Warner T. 
buckling 2:145--147, 2:150 
clamped elliptic shells 1:214 
plates 1:201, 1:217-218 
shakedown theorem 2:297-298 
shells 1:201, 1:217-218, 2:103-105 
see also Kirchhoff—Love 
Kolmogorov... 
energy spectrum 3:43-44, 3:47 
law 3:211-212 
length scales 3:280-281 
Koren limiter 1:457 
Korn airfoils 3:346 
Korn’s inequalities 1:75 
Kruzkov entropies 1:441 
Krylov projection methods 1:556-557 
Krylov subspaces 
accelerated multifrequency methods 2:711 
Dirichlet-to-Neumann boundary condition 2:708 
eigenvalues 1:571-574 
linear algebraic solvers 1:552, 1:556-557, 1:560, 1:561 
Kuhn—-Tucker complementary conditions 2:233 
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Kuhn—Tucker—Karush conditions 2:201, 2:218, 2:219 
KVLCC2 hull model 3:595, 3:596, 3:597~599 


L-shaped domains 1:131 
L-stability 2:178—-179 
L!-contraction principle 1:441, 1:447, 1:451 
L?-condition number 1:365-366 
L2 projections 1:692-693, 1:142 
Lz-norms 2:27-28, 3:261--262 
laboratory tests 2:638-639 
LadyShenskaya~Babuska—Brezzi (LBB) condition 
adaptive wavelets 1:159, 1:181, 1:183 
meshfree methods 1:299-300 
Sobolev index 1:368-370 
viscoelastic fluid flows 3:484 
Lagrangian 
buckling stability 2:142~143 
continuum mechanics 1:413-414 
convection—diffusion equations 3:149 
description of motion 1:415-416 
finite element spaces 1:77 
fluid flow 3:581, 3:591-592, 3:594~607 
multipliers 
contact discretizations 2:206-207, 2:208 
contact mechanics 2:202—203, 2:216-217 
discrete elements 1:322, 1:329 
matrix form 2:208 
meshfree methods 1:293, 1:294-296 
nodal interpolation error estimates 1:65 
particle distribution 3:136, 3:140~141 
phase stress-update 1:429-430 
Lainser tunnel 2:525-529 
LAM see Limited Area Model 
Lamé constants 1:204, 2:67, 2:99 
laminar. .. 
flames 3:500-501, 3:512-514 
flow around cylinders 3:201-205 
level deformation 2:432 
laminates 
level deformation 2:432 
theories 2:433-437 
see also fiber-reinforced composite. . . 
Lanczos method 1:557, 1:571-574 
Lanczos—Tau method 1:143 
Laplace domains 2:754, 2:755-756 
Laplace transforms 1:704, 1:707, 1:711-712, 1:713-714 
Laplacian smoothing 1:421 
Laplacian vertex placement 1:541 
large data visualization methods 1:542-543 
large eddy simulations (LES) 
adaptive 3:184-185, 3:191-197 
aerodynamics 3:423-426 
applications 3:295—296 
boundary conditions 3:291-—293, 3:296 
computability 3:184—185, 3:191—197 
direct numerical simulations 3:184—185, 3:191-198 
eddy viscosity 3:47—49 
filtering operator 3:272-274 
multiscale method 3:40, 3:41-43 
numerical error 3:282—283 
resolution requirements 3:281 
shallow water equations 3:241 
time advancement schemes 3:283 
turbulence 3:2, 3:269, 3:270-274, 3:279-285, 3:291-293, 
3:295-296 


turbulent flames 3:515--517, 3:520 
see also direct numerical simulations 
large strains 2:566, 2:649-654, 3:309-310 
LArge Time INcrement (LATIN) 2:582-584 
large-scale decompositions 3:219-228 
large-scale separations 3:248--251 
lateral boundary conditions 1:207 
lateral wave contact 3:604—605 
LATIN 2:582~584 
lattices 2:269-271, 2:391-392, 2:339-340 
Launder, Reece and Rodi (LRR) turbulence closure 3:313 
laws 
Ampere’s 1:723~724 
Arrhenius 3:517 
constitutive 2:310, 2:204~—206, 2:621, 2:624 
Darcy’s 2:582 
Faraday’s 1:723-724 
first law of thermodynamics 2:12 
force displacement 1:316 
Gauss 1:724 
Hooke’s 1:207, 1:208, 2:15~16, 2:582 
Kolmogorov 3:211-212 
ław-of-the-wake 3:307 
Jaw-of-the-wall 3:307 
material 2:66, 2:74, 2:84, 2:97, 2:98—102 
second law of thermodynamics 2:12-13 
see also conservation 
Lax-Friedrichs finite difference 1:51 
Lax—Friedrichs flux 1:470, 3:95, 3:99 
Lax—Milgram lemma 1:348, 1:352, 1:353-354 
Lax—Wendroff.. . 
explicit time-stepping 3:366--367 
finite difference 1:51 
flux 3:414-415 
positive streamwise invariance 3:415 
theorem 1:447, 3:98 
layer-wise models 2:106~107, 2:435~437 
LBB see LadyShenskaya—BabuSka—Brezzi 
LCO see limit cycle oscillations 
LDG see local discontinuous Galerkin 
leap-frog schemes 1:51, 2:183-184, 2:188-189, 3:252-254, 
3:256-259 
least squares 
adaptive wavelets 1:185 
arbitrary crack growth 2:384-386 
boundary integral equations 1:346, 1:347 
centered moving 1:288-289 
complex geometry 3:346-348 
continuous moving 1:285-—286 
coupled BEM/FEM 1:376, 1:394-396 
Helmholtz equations 2:701 
incompressible viscous flows 3:176 
linear reconstructions 1:461 
meshfree methods 1:280, 1:285~291 
moving 1:280, 1:285-291, 2:384-386 
nonlinear systems 1:651, 1:657~658 
parameter identification 2:643, 2:645-646 
stabilizations 3:187 
see also Galerkin... 
LED see local extremum diminishing 
LEFM see linear elastic fracture mechanics 
left-preconditioning 1:561 
Legendre polynomials 1:393, 2:704, 1:143-144 
Lemaftre’s damage model 2:488, 2:503 


length scales 
cohesive-zone models 2:350 
concrete mechanics 2:520-521 
determination 2:361-362 
Navier-Stokes equations 3:210~-213, 3:218 
parameters 3:592-593 
shells 1:216, 1:223--224 
Leonard tensors 3:274 
Leonard’s decomposition 3:274 
LES see large eddy simulations 
Lesaint—Raviart method 3:483 
level of loading 2:514, 2:517—520, 2:527, 2:533-534, 2:537-538 
level-of-detail (LOD) 1:542 
Liapounov criterion 2:142-143 
Lie derivative 2:243 
lift... 
coefficients 3:84—85, 3:146—-147 
effects 3:432, 3:433 
forces 3:203--204 
over-drag 3:451 
lifting operators 1:251-252 
lifting schemes 1:165 
ligaments 2:625—629 
Lighthill turbulence tensors 3:6 
lighting models 1:528 
limit 
analysis 2:549-551 
cycle oscillations (LCO) 3:460 
factors 2:299-301 
model 1:208~209, 1:211-218 
points 2:145-147, 2:149, 2:150, 2:156-160 
state boundary 2:568 
state functions 2:676—677, 2:678, 2:679 
Limited Area Mode] (LAM) 3:242-246 
line relaxation methods 3:340~-341 
line-tracked interface up-date technique (LTTUT) 3:564-565 
linear... 
acoustics 2:695-714 
advection equations 1:150 
algebraic solvers 1:5, 1:551-575 
direct methods 1:553-560 
LU-factorization 1:554-555, 1:562--567 
preconditioning 1:555-556, 1:560-562 
combination subgrid-scale modeling 3:287-288 
complementary problems 2:219 
convection—diffusion-reactions 3:185, 3:190 
elastic... 
axisymmetric soil layers 2:581—582 
bars 1:128—130 
continua 2:669-671, 2:673-676, 2:677-678 


fracture mechanics (LEFM) 2:375, 2:381, 2:384-385, 2:394~399 


elasticity 2:7~40 

elastodynamics 2:764-766 

elliptic boundary values 1:74-77 

finite elements 1:288 

hyperbolic equations 3:91, 3:92-96 
Maxwell equations 1:723-736 
momentum 2:11, 2:67 

maultigrids 3:177-179 

multistep (LMS) 2:175-176, 2:178 
one-dimensional equations 3:532 
operators 1:168-169, 1:191 

prebuckling state 2:144-145, 2:147-149 
programming (LP) 2:316-317, 2:549-551 
reconstructions 1:457-462 
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regression continuous discretization 2:664—665 
shell theory 2:102 

stability 3:489-490 

structural dynamics 2:181—184 

symmetric hyperbolic equations 3:91, 3:94—96 
vibratory response 2:683-692 


linearization 


buckling 2:141, 2:144-145 
contact-friction 2:473~476 

duality 1:693-694 

forming processes modeling 2:471 
incompressible viscous flows 3:176--179 
local convergence theory 1:657-658 
Navier-Stokes code 3:417 

primary state 2:141, 2:144-145 
Reynolds averaged turbulence 3:421 
shakedown 2:312-313 


linearized elasticity 


adaptive mesh refinement 2:37-39 
collocation 2:724 
error estimates 2:27-37 
error estimation 2:5-7, 2:24-40 
finite element methods 2:5-7, 2:16-40 
hybrid finite element methods 2:21-24 
material responses 2:408, 2:427 
mixed finite element methods 2:21-24 
model adaptivity 2:5~-7, 2:24-40 
nonlinear boundary values 2:16-24 
Schwarz domain decomposition 1:622—623 
symmetric Galerkin BEM 2:729 

Lions’ lemma, domain decomposition 1:629 

Lipschitz. .. 
domains 1:167 
estimate in time 1:447 
surfaces 1:710-711 

liquefaction 2:554 

liquid filled tubes 3:571-573 

liquid motions 2:692 

literature overviews 
adaptive computation 1:676-678 
composite laminates 2:437—440 
multigrid methods 1:579 
a posteriori error control 1:73, 1:87 
thin-walled structures 2:61 

LMS see linear multistep 

load... 
buckling 2:140, 2:152-153 
concrete mechanics 2:533-534, 2:538 
control 2:126 
deformation space curves 2:152-153 
displacement curves 2:345, 2:537-538 
distributions 3:380 
domains 2:296—297, 2:298-299, 2:300-304 
factors 2:305-306, 3:475-476 
geomechanics 2:561 
intensity 2:140 
multifield soil dynamics 2:592, 2:593, 2:594 
parameter 2:152-153 
plates 1:202-205 
transverse 2:438 
vectors 1:328 

loading—unloading conditions 


concrete mechanics 2:514, 2:517—520, 2:527, 2:533-534, 2:537-538 


damage mechanics 2:338 
microgeometrical manufacturing 2:416 
plastic 2:235—-236, 2:340-341 
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loading-unloading conditions (continued) 
shakedown 2:293-294, 2:296-297, 2:300~304, 2:321-326 
tensile 2:437-438 
thermal 2:301 
thermomechanical 2:293--294, 2:321-326 
local... 
conservativity 3:92 
contact mechanics 2:197 
convergence theory 1:649-669 
coordinates 3:64-65, 3:66 
curvature estimates 2:515-516 
discontinuous Galerkin (LDG) 3:117~—118, 3:120~123 
discrete maximum principle in space 1:445—446 
errors 1:699, 1:701 
extremum diminishing (LED) schemes 1:445, 3:348, 3:349-351, 
3:363 
Green’s functions 1:44-45 
interpolation error estimates 1:61-68 
mesh refinement 1:98-104 
parameterizations 1:664-666, 1:669—673 
residuals 3:93-94, 3:96, 3:103 
space-time discrete maximum principle 1:445-446 
stiffness matrices 1:113~114 
time derivatives 1:418 
locality 
adaptive wavelets 1:166 
implicit error estimators 1:90-91 
projection-based interpolation 1:733 
locally upwinded spectral technique (LUST) 3:483 
locking 
discretization 2:111 
finite element formulation 2:120—125 
forming processes modeling 2:476 
meshfree methods 1:298--300 
thin domains 1;221-223 
LOD see level-of-detail 
log-layer solutions 3:305, 3:306, 3:307 
log-normal random variables 2:660-661 
logarithmic strain measures 2:464—465 
logical scriptable attributes 1:492 
logistics reaction-diffusion 1:697~-698, 1:699 
longitudinal integral scale 3:211 
loosely coupled fluid/fluid-mesh/structure time integrators 3:471-473 
Lorenz system 1:680-683 
low... 
level operation levels 1:476 
order elements 2:477~478 
Reynolds number k-e model 3:306 
speed effects 3:432, 3:433 
lower bounds 1:97—98, 2:414 
LP see linear programming 
LRR see Launder, Reece and Rodi 
LSIC stabilization dynamics 3:550 
LSSD stabilization flows 3:173-174 
LTIUT see line-tracked interface up-date technique 
LU-factorization 1:554—-555, 1:562—567 
Lucy statue 1:520-521 
Liiders band propagation 2:346, 2:348-349 
lumped parameter models 3:530-531 
LUST see locally upwinded spectral technique 
Lyapunoc equation 1:615 


Mach numbers 
aeroelasticity 3:460, 3:475-476 
fluid flow 3:334 


shock-capturing 3:359 
subgrid-scale modeling 3:290 
macroelements 1:78, 1:266 
macromechanics 2:440—451, 2:452-458 
macroscale composite laminates 2:440-451 
Macroscopic, .. 
constitutive behavior 2:449-451 
continuum slip theory 2:267-268, 2:269-270 
damage 2:452-458 
free energy 2:271-272 
material models 2:535~537 
material responses 2:407—427 
magnésium 2:648-649 
magnetic fields 1:724 
magnetohydrodynamics (MHD) 3:81—84 
magnetostatics 1:724 
man-made structures 2:558—561 
Mandel—Cryer effect 2:562 
manifolds 1:662-664, 1:666-667, 3:211—212 
mapping 
affine 1:58, 1:62-63 
bump 1:491 
color 1:531~532, 1:533-534 
displacement 1:533~534, 2:247, 2:490-491 
elastic deformation 2:244-250, 2:275-276 
exponential 2:248-—250, 2:467-468 
Frechét-differentiable 1:650, 1:656 
internal variable 2:489-490 
isoparametric 1:58 
p-finite element method 1:124-126 
particle motion 1:417—418 
pitchfork 1:669-670 
return 2:244-250, 2:275-276, 2:219, 2:467-468 
transfinite 2:275-276 
visualization algorithms 1:531—532, 1:533-534, 1:539 
marching cubes 1:476—477, 1:484--485, 1:532, 1:533 
margins of safety 2:551 
MARK 1:98, 1:99, 1:100 
marking criterion 1:100 
Mars Lander 3:327, 3:328 
mass... 
balance 3:585 
conservation 1:419—420, 1:426, 2:10, 3:581-582, 3:585 
fractions 3:510-511 
matrices 1:571, 2:764, 2:766—767 
transfer 2:584—586, 2:592--599 
master-slave concept 2:210-211, 2:212-213 
material... 
acceleration 1:419 
derivatives 1:418—419, 3:129 
domains 1:414, 1:416, 1:417-418 
elasticity tensors 2:455-456 
frame indifference 2:239-240, 2:241 
functions 2:526-527 
heat flux vector 2:12 
heterogeneity 2:432—433 
instabilities 2:341—-349 
laws 2:66, 2:74, 2:84, 2:97, 2:98-102 
layers 2:112-113 
matrices 2:75 
nonlinearities 2:72-76 
parameter identification 2:637-654 
response properties 2:407-427 
setup 2:85-86 
stiffness 2:398—399, 2:553 
tensors 2:66-67, 2:449-451, 2:455--456 
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mathematical models 
fluid flow 3:330-334, 3:335 
incompressible viscous flows 3:156-160 
thin-walled structures 2:61-68 
turbulent fiows 3:271-279 
mathematical programming methods 2:202, 2:216, 2:219-220 
MATLAB 1:98, 1:113-114 
matrices 
compression 1:175~-181 
contact discretizations 2:208-209 
exponential function 1:615 
formulation discretization 1:600 
linear algebraic solvers 1:552 
matrix-matrix multiplication 1:611 
matrix-vector 
addition 1:611 
computations 3:567-568 
multiplication 1:600~601, 1:602—-604, 1:607, 1:611 
notation 2:15-16 
subtraction 1:611 
truncation 1:611 
mechanical properties 2:426 
monotonicity 1:16-17 
partitioning 2:580~582 
saddle-point stability 1:248-257 
see also stiffness. .. 
maximum angle condition 1:83 
maximum principles 
discrete 1:10~11, 1:445-446 
elastic shakedown 2:300, 2:302 
fimite volume methods 1:445-446, 1:459, 1:463-464 
parabolic equations 1:18-19 
Maxwell compatibility expression 2:343 
Maxwell equations 
de Rham diagrams 1:729, 1:732-734 
discontinuous Galerkin methods 3:94 
exact sequences 1:727~732 
finite element methods 1:6, 1:723-736 
projection-based interpolation 1:732-734 
space-time boundary integrals 1:708~-709 
variation formulation 1:725-727 
MCC see Modified Cam Clay 
MDEM see modified distinct element method 
MDS see multilevel diagonal scaling 
MEAM see modified embedded atom method 
mean 
drag coefficient 3:84—85, 3:192, 3:194, 3:195 
flow 3:302-303, 3:304, 3:318, 3:319, 3:321 
shear stresses 3:535, 3:536-537 
value (random variables) 2:659 
value (vectors) 2:646 
velocity 3:301, 3:307 
mechanical. .. 
linear elastostatics 2:408 
power supply 2:12 
property upscaling 2:516 
stress 2:10 
mechanics 
carotid artery 2:613, 2:614 
composite laminates 2:439 
material responses 2:425-426 
thin-walled structures 2:63-68 
see also computational flow; computational fracture; contact; 
continuum; damage; geomechanics; micromechanics 
media theory 2:514, 2:515-529 


medial axis 1:485-490 
medial modeling 1:485-490 
medical imaging data 3:538-540 
Melan-Koiter’s theorem 2:297-298 
Melan’s theorem 2;297-299 
member (structural) scales 2:530-—538 
membranes 
deformation patterns 1:213 
displacements 1:202, 1:206-207 
dominated action 2:68-70 
elements 2;108-109 
energy 1:209, 1:215 
generator 1:204—205 
locking 1:221-222, 2:121, 2:123-124, 2:125 
operators 1:213 
shells 1:215 
theory 2:102-103 
memory limitations/usage 2:198, 3:485-486 
mercury intrusion porosimetry (MIP) 2:517-519 
mesh. .. 
acceleration 1:419 
adaptation 1:498 
adaptive finite element methods 1:510-516 
arbitrary Lagrangian—Eulerian 1:420, 1:422 
error estimates 1:511-516 
forming processes modeling 2:488~-489 
incompressible viscous flows 3:170-175 
unit volume 1:507-510 
aerodynamics 3:359-360, 3:396 
arbitrary crack growth 2:389-390 
bias 2:362-363 
deformation gradients 3:441—442 
density responses 2:417, 2:418 
depending weights 1:87 
discretization 3:362~363 
element construction 1:500 
evolution 2:506 
filtering 3:277-279 
generation 1:5, 1:497-502 
adaptive finite elements 1:510-516 
aerodynamics 3:426-428 
forming processes modeling 2:488—489 
introductory survey 1:1 
layout 1:223-224 
medial modeling 1:490 
moving boundaries 1:517-521 
node updating 3:590-591 
refinement 1:87, 1:98~104, 2:488 
regularization 1:420-—422 
representation 1:546—547 
resolution rules 2:699-700 
sensitivity 2:341, 2:344-349 
smoothing 1:421-422, 1:541 
thin domains 1:220 
topological representation 1:546—-547 
updating 1:420-422, 3:553-~555, 3:590--591 
velocity 3:70 
meshfree methods 1:3-4, 1:279-306 
approximation 1:280~291 
arbitrary shape approaches 2:383—386 
blending finite element methods 1:303-306 
convection-diffusion equations 1:45-47 
differencing 1:45-47, 1:290, 1:291 
discontinuities 1:300--303 
finite difference method (MFDM) 1:290, 1:291 
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meshfree methods (continued) 
higher-order damage models 2:359 
incompressibility 1:298—300 
local Petrov-Galerkin (MLPG) method 1:291 
moving least squares 1:280, 1:285-291 
partial differential equations 1:291~300 
radial basis functions 1:300 
smooth particle hydrodynamics 1:280, 1:281—285, 1:291 
volumetric locking 1:298-300 
metals 
aluminum 2:379, 2:417—421, 2:648-649 
cutting operations 2:502—506 
microplane damage models 2:339-340 
stamping 1:518, 1:519 
see also steel 
meteorology 3:241-242 
method of... 
characteristics 1:47-50, 1:439—440 
contours of vorticity curl 3:131, 3:132-133 
finite spheres 1:280 
lines 4:20-21, 1:35, 3:96, 3:167-—168 
methodical depths 1:2 
methodical widths 1:2 
metric definition 1:507 
metric structures 1:487 
metric tensors 1:211-212 
MFDM see meshfree finite difference method 
MHD see magnetohydrodynamics 
microcracking 2:437-440, 2:526, 2:553 
microgeometrical manufacturing 2:414-416 
micromechanics 
composite laminates 2:432—-433, 2:437-451, 2:452-458 
concrete mechanics 2:515-519 
homogenization 2:407-427 
multiscale modeling 2:407—427 
normal contact stresses 2:204 
microplane damage models 2:339-340 
micropolar theory 2:76-~-79, 2:83 
microscale composite laminates 2:440~451 
microscopic constitutive behavior 2:449-451 
microscopic damage 2:432-433, 2:452-458 
microstructures 2:267-287 
midpoimt rule 3:54-55 
midpoint step functions 2:664 
midsurface curvature tensors 1:211-212 
mild steel axisymmetric necking 2:651-652, 2:653-654 
military aircraft 
aeroelasticity 3:460, 3:461, 3:474-477 
computational flow mechanics 3:428~431, 3:436 
forcbody control 3:436 
transonic flow 3:326—327 
MINI elements 1:262—263, 1:266, 1:270 
minimal residuals, generalized 1:558-559, 2:702, 3:565-566, 3:570 
minimum angle conditions 1:79-80 
minimum norm residuals 1:556, 1:558 
MINRES 1:389, 1:393 
MIP see mercury intrusion porosimetry 
Mirage 2000 aircraft 3:429, 3:430 
mismatched mesh representation 1:547 
MITICT see mixed interface-tracking/interface-capturing technique 
mixed derivative elliptic partial differential equation 1:14 
mixed element-matrix-based/element-vector-based computation 
technique (MMVCT) 3:567-568 
mixed finite element methods 1:3, 1:237-276 
elastic shakedown 2:303-304 
elasticity 1:241-246, 1:268-269, 2:21-24 


inf—condition 1:248-252, 1:254, 1:257-259, 1:266-276 
linearized elasticity 2:21-24 
local discontinuous Galerkin 3:117-118 
saddle-point stability 1:246-257 
stability 1:246--257 
Stokes equations 1:240-241, 1:262—268 
thermal diffusion 1:238-240, 1:257~262 
viscoelastic fluid flows 3:481-496 
mixed function subgrid-scale modeling 3:286-287 
mixed interface-tracking/interface-capturing technique (MITICT) 
3:561~562 
mixed modeling 3:519-520, 3:521 
mixed principles 2:300, 2:302, 2:305 
mixing enhancement 3:425-—426, 3:436-437 
mixture fractions 3:508-511 
MLPG see meshfree local Petrov-Galerkin 
MLS see moving least squares 
MMVCT see mixed element-matrix-based/element-vector-based 
computation technique 
modal analysis 1:209-210 
mode jumping 2:160, 2:161-164 
model. . . 
adaptivity 2:5-7, 2:24-40, 2:44-46 
attributes 1:490—492 
decisions 2:83-102 
error estimates 2:42—44 
parameters 1:492 
models 
continuum failure 2:335—336, 2:341—349, 2:369 
failure 2:335-370 
thin-walled structures 2:59-103 
turbulent flame combustion 3:519-521 
visualization algorithms 1:538-541 
see also forming processes; geometric modeling; multiscale methods 
modified. .. 
Cam Clay (MCC) model 2:556-558 
deformation gradients 2:477~—478 
distinct element method (MDEM) 1:334 
embedded atom method (MEAM) 2:391-394 
Hellinger—Reissner functional 1:240, 1:243, 1:244 
incomplete LU factorization 1:563 
Lax~Wendroff flux 3:414-415 
nonrefiecting Dirichlet-to-Neumann boundary condition 2:705-708 
variational principles 1:293 
wave numbers 3:282 
molecular transport 3:505-506 
Molenkamp model 2:556 
moment... 
closures 3:301, 32311—318, 3:320 
of linear momentum conservation 2:11 
predictions 3:239 
momentum 
arbitrary Lagrangian—Eulerian 1:419-420, 1:426 
contact mechanics 2:200 
equations 
conservation 1:419—420, 1:426 
reacting flows 3:505 
structural dynamics 2:173-174 
turbulence closure 3:302 
viscoelastic fluid flow 3:491-492 
preserving time integration 2:185—186 
ship hydrodynamics 3:581--582, 3:585 
monitoring buckling 2:156 
monolithic methods 2:592-599 
Monot model 2:556 
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monotonicity 
elliptic partial differential equations 1:15-17 
finite volume methods 1:451, 1:452-—453 
Godunov finite volume discretizations 1:444 
monotone integrated large eddy simulations 3:283 


monotone upstream-centered scheme for conservation laws 


1:452-453, 1:456-457, 3:329 
turbulence equations 3:419-420 
Monte Carlo method 2:646, 2:647, 2:649 
Mooney-Rivlin material 2:14 
Morawetz’s theorem 3:346 
Mori-Tanaka scheme 2:524 
Morse theory 1:493 
mortar method 1:643-644, 2:210, 1:152-154 
motion 
arbitrary Lagrangian—Eulerian 1:413-418 
elastic body deformations 2:7-9 
equations 1:316, 2:161~162 
Eulerian 1:415-416 
Lagrangian 1:415-416 
ship hydrodynamics 3:602, 3:603 
moving... 
boundaries 1:498, 1:517-521, 3:3, 3:545-574 
domains 3:69-70, 3:76-77 
heat sources 1:696 
interfaces 3:545--574 
least squares (MLS) 1:280, 1:285-291, 2:384—386 
meshes 3:565-566 
reaction fronts 1:698—699, 1:700 
MSM see multiplicative Schwarz methods 
multi dimensional consolidation 2:562 
multi-step chemical reactions 3:510-511 
multibody systems 
block deformability 1:324—-329, 1:334-335 
boundary conditions 1:321—324 
conservation time integration 2:186, 2:187 
contact constraints 1:321-324 
discontinuous deformations 1:326-329 
discrete element methods 1:311-329, 1:331-335 
structural dynamics equations 2:174, 2:186, 2:187 
time integration 1:331-333 
multicoloring 1:563-564 
multidimensions 
finite volume methods 1:455—464 
hyperbolic conservation laws 3:108—110 
shock capturing 3:363~364 
multidirector models 2:106 
multidisciplinary field equations 3:462 
multidomains 3:412—413 
multifield problems 2:575-599 
monolithic methods 2:592—599 
partition solution procedures 2:577—-589 
soil dynamics 2:589-592 
multifrequency solution methods 2:710-711 
multigrid methods 1:5 
additive variants 1:589-590 
boundary element method 1:593-595 
eigenvalues 1:593 
finite element equations 1:586-589 
general remarks 1:577-581 
implementation ease 1:578 
incompressible viscous flows 3:176, 3:177-179 
iterations 1:580-581, 1:590-592 
literature 1:579 
Maxwell equations 1:735 


nested iteration 1:590-592 
subspace iteration 1:590 
time-stepping schemes 3:372—375 
turbulent flows 3:233-234 
two-grid iterations 1:581—-584 


multilayer formulations 2:112-113 
multilevels 


closures 3:288 
component compaction 1:431, 1:432—433 
diagonal scaling (MDS) 1:633 
error estimators 1:91—93 
homogenous isotropic turbulence 3:209-240 
overlapping domain decomposition 1:632~633 
shallow water equations 3:240-259 
time integration 3:252—259 
turbulence 3:2 
multiparameter subgrid-scale modeling 3:289-290 
multiphase flow 2:584-586, 2:592-599 
multiphysics 2:575-599, 3:412—413 
multiple reciprocity 2:726 
multiple sample tests 2:418 
multiplicative 
decomposition 2:464 
pancl clustcring 1:600—601, 1:602—604, 1:607, 1:611 
plasticity 2:240~-244, 2:248-250 
Schwarz methods (MSM) 1:620-621, 1:625-630 
multipoles 
expansions 2:705-707, 3:141-144 
moments 2:730-731 
panel clustering 1:600 
multiresolution 
constructions 1:163-164 
geometric modeling 1:480-481 
visualization 1:543 
multiscale methods 3:1 
advective-diffusive equations 3:34, 3:37 
composite laminates 2:432—433, 2:440-451 
concrete mechanics 2:513-539 
crystal plasticity 2:267—287 
Dirichlet-to-Neumann formulation 3:8-11 
expansions 1:202-218 
homogenization methods 2:407-427 
incompressible Navier-Stokes equation 3:40-55 
material responses 2:421~425 
micromechanics 2:407-427 
modeling 2:407-427, 2:432-433, 2:440~-451, 2:513-539 
space-time formulations 3:27-32 
stabilized methods 3:5-55 
turbulence 3:40--55 
multislip crystal plasticity 2:277 
multistage explicit time-stepping schemes 3:367-368 
multistep time integration 2:175-176 
multizone boundary element method 2:732 
Murman—Cole difference scheme 3:339-340, 3:341~343 
muscular arterial walls 2:606-607 
myocardial tissue 2:619-625 


NACA see National Advisory Committee for Aerodynamics 
Naghdi model 1:201, 1:223--224, 2:86 
Nanson’s formula, elastic bodies 2:8 
NASA crew rescue vehicles 3:434—435 
National Advisory Committee for Aerodynamics (NACA) 
6 series airfoils 3:326-327 
0012 airfoils 3:78-81, 3:101-103, 3:122-123 
0012 profiles 3:593, 3:594, 3:595 
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National Advisory Committee for Aerodynamics (NACA) (continued) 
0012 wing inverse design 3:394-397 
0012 wing transonic fiow 1:519-520 
0015 pitching airfoils 3:77-78 
National Science Foundation 3:473-477 
NATM see New Austrian Tunneling method 
natural draught cooling towers 2:533-538 
natural trace spaces 1:363 
Navier-Stokes equations 
aerodynamic code 3:412, 3:416-418, 3:433-434 
aerodynamic shape optimization 3:383-386 
blood flow 3:533, 3:534 
compressible 3:122—123 
computational fluid dynamics 3:183 
dynamic multilevel methods 3:228-240 
finite element discretization 3:173-174 
finite element methods 3:548-549 
homogenous isotropic turbulence 3:209-240 
LSSD stabilization 3:173-174 
multilevel methods 3:209-240 
renormalization theory 3:259-260 
ship hydrodynamics 3:581-584 
shock capturing 3:348-—359 
spectral methods 1:146-148 
supercomputing 3:412 
turbulence 3:209~240, 3:271-272, 3:274—-277, 3:301 
viscous discretization 3:364-365 
see also incompressible. . .; Stokes equations 
near-field partitioning 1:601 
near-field scales 3:8~9 
near-wall modeling 3:305—306, 3:308, 3:315-317, 3:320 
nearly singular integrals 2:725 
necking 2:284 
Nedelec’s construction 1:730 
neighbor searches 1:318 
neo-Hooke materials 2:14 
nested finite element spaces 1:587 
nested iterations 1:590—592 
nested solutions 3:172-173 
von Neumann analysis equation 3:345 
Neumann boundary conditions 1:15, 2:67, 3:220-221 
Neumann problems 1:707, 2:29-30, 2:34-35, 2:50-52 
Neumann—Neumann preconditioners 1:618, 1:639-641 
neural networks 2:642-643 
neutral equilibrium state 2:139, 2:144 
neutron transport equation 3:91, 3:92~96 
New Austrian Tunneling method (NATM) 2:525-529 
Newton potential 1:716 
Newton—Raphson algorithm 2:18, 2:471 
Newtonian fluids 3:269-296, 3:304, 3:533 
Newton’s methods 
adaptive wavelets 1:186, 1:194 
crystal plasticity 2:278-—279 
incompressible viscous flows 3:176 
local convergence theory 1:657, 1:658--659, 1:660-661 
multigrid methods 1:593 
shakedown 2:314-—315 
thin-walled structures 2:126-127 
NIF capsule implosion simulation 3:115, 3:117 
nine-point differencing 1:13-14, 1:50 
Nitsche method 1:297—299, 1:364—-365, 2:203-204, 2:210 
NLP see nonlinear programming 
NMR see nuclear magnetic resonance 
no-slip boundaries 3:209, 3:220-221, 3:305, 3:308, 3:316 
nodal interpolation 1:80-82, 1:58-59, 1:61-66 


nodal modes 1:122, 1:123 
node splitting 1:626—627, 2:380-381 
node-decoupling 2:380-381 
node-regular refinement 2:37-38 
node-to-segment (NTS) 2:209, 2:211-212, 2:215 
noise 3:444—448 
non smooth... 

contact 1:314-316, 1:334 

solutions 1:130 

surfaces 2:213-214 
non-Newtonian fluids 3:2, 3:481 -496 
non-self-adjoint problems 1:37-38 
nonassociative constitutive laws 2:310 
noncolinear crack propagation 2:378-379 
nonconforming finite elements 1:105—107, 1:588-589 
nonconservative difference equations 3:341-342 
nondimensional moment predictions 3:239 
nondivergent flows 3:309 
nonflexural modes 1:227, 1:228 
nongeometric representations 2:394—400 
nonisothermally fully saturated consolidation 2:586-589 
nonlinear. .. 

AERO simulation platforms 3:473—477 

boundary values 2:16-24 

computational aeroelasticity 3:459-477 

conservation laws 1:439-450, 1:468—470, 3:91, 3:96-115, 1:150 

constitutive equations 3:310~311 

continua 2:64-70, 2:679-680 

difference equations 3:340-341 

elasticity 1:376-377, 1:403—405, 2:7~16 

elastoplasticity 1:133-135 

equations 1:592-593 

forming processes 2:485-491 

Galerkin method 3:49 

geometric applications 1:135-136 

Hencky-von Mises stress-strain relation 1:377 

hyperbolic problems 1:158 

kinematic hardening 2:308—309 

least-squares 3:176 

loosely coupled fluid/fluid-mesh/structure time integrators 3:471-—473 

material behavior 2:408, 2:451~452 

multigrid iteration 3:176 

one-dimensional equations 3:532, 3:533 

operators 1:191-—193 

parabolic equations 1:23-26 

programming (NLP) 2:317-318, 2:550 

reaction rates 3:499 

softening 2:451-452 

solid mechanics 1:426, 1:428-433 

strong stability preserving time integration 1:465 

structural dynamics 2:184—-187 

theory 2:7-16 

three-field formulation 3:461-464 
nonlinearity 1:6 

adaptive wavelets 1:183-184 

buckling 2:141 

geometric 2:310--311, 1:135-136 

local convergence theory 1:649-669 

mathematical modeling 2:61-70 

shells 2:68~70 

soil consolidation 2:566 
nonlocal damage models 2:356-357, 2:358-359 
nonmatching discrete interface compatibility 3:468-471 
nonmatching meshes 2:209~210 
nonmoving meshes dynamics 3:565-566 


nonnested finite element spaces 1:588 
nonnormal random fields 2:667 
nonoscillatory schemes 1:453—455, 3:348, 3:349-356 
nonoverlapping domam decomposition 1:91, 1:618--620, 1:633-644, 
2:712 
nonpenetration conditions 1:314, 1:328, 2:198—199, 2:201 
nonpremixed fiames 3:500~501, 3:502 
nonrefiecting Dirichlet-to-Neumann boundary conditions 2:703-708 
nonshakedown 2:299 
nonsingular. . . 
hybrid boundary element method 2:762-768 
integrals 2:725 
symmetric boundary element method 2:762—768 
nonstationary incompressible viscous flows 3:175 
nonstationary Navier-Stokes equations 3:166-170 
nonstiff initial value problems 1:680~683 
nonsymmetric finite elements 1:104—105 
nonuniform large strains 2:649-654 
normal... 
contact stresses 2:204 
derivative kernels 1:605 
displacement 2:685-686, 2:690 
distance 2:198-199 
distribution 2:659-660 
frictional contact 2:472 
gap variation 2:199-200 
interfaces 1:424 
stresses 3:310-311 
normalized normal density 2:659-660 
normalized objective functions 2:425 
norms 
equivalences 1:166—168 
of the error 3:121-122 
error estimates 1:87, 2:25-26, 2:27 
linear elliptic boundary values 1:74~75 
plates and shells 1:229 
saddle-point stability 1:247-248 
NTS see node-to-segment 
nuclear magnetic resonance (NMR) 2:518 
numerical... 
codes 3:413-418 
diffusion fluxes 1:467 
discretization 2:416-417 
dispersion numbers 1:33-34 
error 3:281 -283 
filters 3:272~274 
fluxes 
conservation laws 3:97, 3:99 
discontinuous Galerkin methods 3:92, 3:95, 3:119 
Euler code 3:414 
functions 1:442—443, 1:469-470 
RKDG 3:109 
second-order elliptic problems 3:117 
implementations 
dynamic multilevel methods 3:231—240 
gradient-enhanced damage models 2:357-358 
renormalized scales 3:260-263 
viscoelastic direct boundary elements 2:757-758 
integration 
collocation 2:724-725 
elastoplastic deformations 2:227- 264 
finite elements 1:107—109 
symmetric Galerkin BEM 2:729 
viscoplastic deformations 2:227-264 
visualization algorithms 1:534—-535 
layers 2:112—113 


Subject Index 779 


methods 
arterial walls 2:611-612 
boundary integral equations 1:346-347 
constitutive equations 2:651 
dynamic multilevel methods 3:214~219 
geomechanics 2:545-549 
shakedown 2:316-320 
transonic potential flow 3:337-339 
viscoelastic fluid flow 3:494 
simulations 
adaptive wavelets 1:157—-195 
aeroacoustics 3:446-448 
civil aircraft 3:431 
renormalized scales 3:260-263 
turbulent flows 3:235-240 
see also direct... 
tests 1:180 
traces 3:91, 3:92, 3:115 


object-order volume rendering 1:542 
objective functions 2:425, 2:644 
octrees 1:499-500, 1:477 
ODEs see ordinary differential equations 
Oldroyd-B fluids 3:482—488 
Oleinick’s E-condition 1:444~445 
one-dimensional 
bars 2:676-677 
convergence 1:583-584 
finite volume methods 1:450~455 
hierarchic shape functions 1:120-121 
statically determinate structures 2:661-—662 
wave propagation 3:531-533 
one-step methods 1:715-717, 2:182-183 
ONERA M6 wings 3:115, 3:116 
ONERA S1 Modane 3:429, 3:450-451 
open inventor toolkit 1:544 
OpenDX development environment 1:544—545 
OpenGL toolkit 1:544 
operation counts 1:604 
Operational Loads Survey rotor 3:104, 3:105 
operational quadrature method 1:704, 1:715, 1:717-719 
operators 
adaptivity 1:160, 1:165, 1:175~181 
advection-diffusion 3:18 
algebraic 3:30-32 
bending 1:214 
BGT 2:706 
biharmonic 1:34 
boundary integrals 1:342-344, 1:349, 1:351 
Clément 1:59-60, 1:66~68 
complementing 1:229 
consistent tangent 2:736-738 
cur) 1:725, 1:727-734 
differential 1:238, 3:30-32 
element nodal interpolation 1:80-82 
equilibrium 2:294-295 
Euler 1:481-482, 1:483 
filtering 3:272-274 
five-point difference 1:9 
gradients 1727-732 
heat potential 1:718-719 
Helmholtz 3:18 
hypersingular 1:349, 1:376, 1:176, 1:177 
interpolation 1:80--82, 1:84, 1:58-60 
jumps 2:172 
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operators (continued) 
lifting 1:251-252 
linear 1:168—169, 1:191 
membranes 1:213 
nodal interpolation 1:80-82 
nonlinear 1;191~193 
p-exact reconstruction 1:462—463 
residual error estimates 1:88 
scale separation 3:219 
Scott-Zhang 1:60, 1:66, 1:67 
sparse approximations 1:159, 1:175-181 
split methods 2:467, 2:590—591 
Steklov—Poincaré 1:380—382, 1:384—386, 1:399 
superdissipative 3:241 
tangent stiffness 2:337—338, 2:339 
trace 1:351 
wavelets 1:160, 1:165, 1:168-169, 4:175-181 
optimization 
codes 3:412—413 
Dassault code SOUPLE 3:437-443 
elastic shakedown 2:301, 2:306, 2:310 
multigrid methods 1:578 
parameter identification 2:643-~645 
projection-based interpolation 1:733 
theory 2:219, 3:437-443 
unit volume meshing 1:509-510 
ordinary differential equations (ODEs) 
adaptive computation 1:680, 1:699, 1:701 
first-order 3:31-32 
shallow water equations 3:243-244, 3:249-250 
space-time formulations 3:31-32 
structural dynamics 2:170-189 
time step control 1:699, 1:701 
ordinary least squares 2:643 
orientation tensors 3:491-492 
orientations 1:322~323, 1:600-—608, 2:426 
orthogonal iteration 1:569 
orthogonal polynomials 1:143-146 
orthogonality 
algebraic complements 1:730 
cohesive-zone models 2:352-353 
partitioning errors 2:424-—425 
subspatial errors 2:424-—425 


orthotropic microstructurally-based constitutive models 2:620-621 


oscillation control 1:453-455, 3:348-356 
oscillatory shear index 3:537-538 
oscillatory traction 2:364-365 
Oseen equations 3:121~-122 
Osher—Solomon flux 1:469 
out-of-core methods 1:543 
out-of-plane material models 2:536-537 
outflow boundary conditions 3:291 
overall reaction terminology 3:507-508 
overall testing processes 2:417—421 
overlap 
boundary integral equations 1:704-705 
incomplete LU factorization 1:563-564 
overlapping domains 2:575-599 
decomposition 1:617-620, 1:630~-633 


nonoverlapping decomposition 1:91, 1:618-620, 1:633--644, 2:712 


overlay shakedown model 2:308-309 
overshoot 2:179-180, 2:181, 3:52-53 


p convergence 1:126-131 
p-exact reconstruction operators 1:462-463 


p-finite element method 1:119-137 
convergence characteristics 1:126-131 
discretization 1:3 
elastoplasticity 1:133-135 
geometric nonlinearity 1:135-136 
implementation 1:120-126 
industrial applications 1:134-135, 1:136-137 
mapping 1:124-—126 
nonlinear elastoplasticity 1:133-135 
one dimensional hierarchic shape functions 1:120-121 
performance characteristics 1:131~-133 
thin domains, shells 1:219-229 

panel clustering 1:5-6, 1:597~615 
boundary element method 1:597-600, 2:731 
elasticity 2:731 
finite element method 1:597 
fully populated matrices 1:597-615 
hierarchical matrices 1:597, 1:607-615 
multiplication 1:600--601, 1:602-604, 1:607, 1:611 

panel methods 3:336—-337, 3:580 

pantographs 3:5-6 

parabolic. . . 
differential equations 1:675-702, 3:341-342 
equations 1:18-28 
interpolation 1:125—-126 
problems 1:3, 1:705~706, 1:710-719 
smoothing 1:688 
see also adaptive computation 

parafoils 3:449-451 

parallel. .. 
edge-cracked borosilicate glass plates 2:39-40 
elimination 1:566 
iteration 2:711-712 
mesh adaptivity 1:516 
processing 3:476—477 
RKDG 3:112-113, 3:114 
visualization 1:543 

parameterizations 
bifurcations 1:669-673 
continuation 1:665~666 
local convergence theory 1:661—669 
Maxwell equations 1:731-732 
nonlinear equations 1:661-669 

parameters 
bifurcations 1:670-672 
derivatives 2:650-651 
identification 2:647-654 
panel clustering 1:603-604 
sensitivity 2:667—668 

parametric curves 1:504—505 

parametric surfaces 1:505~506, 1:479 

ParaView application 1:546 

Pareto fronts 3:393, 3:396 

partial differential equations (PDEs) 
discretization 1:15-17, 1:291-300 
elliptic 1:12-18 
geometric modeling 1:493 
meshfree methods 1:291-300 
weak solution 1:347~358 

partial pivoting 1:554-555 

partially. . . 
diagonal scaling preconditioner 1:392-393 
premixed flames 3:500~501, 3:503 
saturated cement pastes 2:517—520 
saturated consolidation 2:584-586 


particles 
advection 1:534—535 
deformability 1:311-335 
distribution 1:286-287, 1:289 
fracture mechanics 2:391-392 
grid method 3:144—145 
integration form 1:293 
interactions 2:412 
motion 1:413-418 
redistribution 3:136, 3:140-141 
strength exchange (PSE) 3:137--139, 3:145-149 
traces 1:535-536 
velocity 1:417 
particulates 1:311-335, 2:418—420, 2:425—426 
see also discrete element methods 
partitioning 
errors 2:424—425 
finite element spaces 1:79-80 
hierarchical matrices 1:608~609 
material responses 2:421-422, 2:425 
multifield problems 2:577-589 
panel clustering 1:601—602, 1:608-609 
partitions of unity (PU) 
discontinuous meshfree methods 1:301-—303 
discrete failure models 2:365-369 
finite element method 1:291 
implicit error estimators 1:90-91 
meshfree methods 1:280, 1:291-292, 1:301--303 
moving least squares 1:291 
partial differential equations 1:291-292 


- passenger aircraft 3:431—433, 3:434 


passive mechanical behavior 2:607—608 
passive myocardial tissue 2:624 
path following methods 2:151-153 
path functions 2:12 
path parameters 2:152-153 
PBX preconditioner 1:633 
PCG see preconditioned conjugate gradients 
PDE see partial differential equations 
PDEs see partial differential equations 
PDFs see probability distribution functions 
PEC see perfect electric conductors 
Péclet numbers 3:24 
pellet impacts 2:379 
penalty method 
contact discretizations 2:207 
contact mechanics 2:202, 2:203, 2:217 
discontinuous deformations 1:322—323, 1:328-329 
matrix form 2:209 
multibody contact forces 1:321-322 
partial differential equations 1:293, 1:296-297 
penetration. .. 
checks 2:215-216 
conditions 1:314, 1:328, 2:198—199, 2:201 
functions 2:199 
Peraire-Jameson flux 3:415 
perfect electric conductors (PEC) 1:726 
perfect incremental collapse 2:305 
perfectly matched layers (PML) 
acoustics 2:696, 2:710, 3:50-52 
eddy viscosity 3:50-52 
electromagnetics 3:50-52 
Maxwell equations 1:735 
perfectly stirred reactors (PSR) 3:513-514, 3:517 
perforated plates 2:326—-329 
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performance evaluation 3:486-487 

performance functions 2:568, 2:569 

periodic... 

boundary conditions 

composite laminates 2:448, 2:449 
large eddy simulations 3:291 
multibody contact forces 1:323-324 
representative unit cells 2:443-445 


shallow water equations 3:242-245, 3:248, 3:249-250, 3:251—259 


flows 3:219-221, 3:235-240, 3:377-379 
forcing functions 1:27 
media theory 2:514, 2:515 
surface loads 2:592, 2:593, 2:594 
permeability property upscaling 2:516 
perturbation 
adaptive wavelets 1:158—159, 1:186-187 
Lagrangian 1:293, 1:322, 2:203 
postbuckling 2:147~149, 2:150 
saddle-point stability 1:254-256 
sensitivity 1:655-656 
stochastic finite elements 2:668~-671 
Stokes equations 1:267 
Perzyna viscoplastic relation 2:235, 2:242—243 
Petrov-Galerkin (PG) 
direct linear algebraic solvers 1:556, 1:559 
elliptic boundary values 1:465—466 
ship hydrodynamics 3:583 
time finite element methods 2:186 
weak form 1:280, 1:291-292 
phase 
changes 2:584—586, 2:592-599 
errors 3:62-63 
stress-updating 1:429--431 
phenomenological theory 3:226—227 
photometric textures 1:491 
Picard iterations 1:594 
piecewise linear methods 1:661—662, 1:664 
piezoelectricity 2:758-759, 2:761-762, 2:763 
pinball technique 2:215 


Piola—Kirchhoff stress tensors 2:10, 2:66, 2:441-442, 2:444, 1:136 


Piola’s transform 1:260-—261 
pitchfork mapping 1:669~670 
pitching... 
airfoils 3:77-78 
cycles 3:380 
moment 3:451 
pivoting 1:570 
planar... 
boundary representation schemes 1:481-482 
Couette flow 3:489-490, 3:491 
domains 1:499 
fiow past cylinders 3:487-488, 3:494—496 
polygons 1:481-482 
plane linear elasticity 1:377, 1:405—408 
plane-strain 2:15, 2:286—287, 2:504—506 
plane-stress 1:204~206, 2:15 
planetary model see periodic boundary conditions 
planforms 3:392-396 
Plank equation 3:492 
plant physiology 1:685, 1:686 
plasma flows 3:81-85 
plastic. .. 
correctors 2:467 
crystals 2:280-283 
deformations 2:269—287 
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plastic. . . (continued) 
elastic analysis 2:727—729, 2:738-740 
evolution equations 2:232-235, 2:241 
flow 2:274-275 
frictional tangential slip 2:206 
loading 2:235-236, 2:340-341 
metric 2:243 
multipliers 2:233, 2:315-316, 2:482, 2:738 
shakedown 2:324-325 
slip 2:273, 2:276~-278, 2:206 
strain 2:466, 2:506 
tangent moduli 2:279-280 
volumetric strain 2:466 
work rates 2:487 
zone evolution 2:369 
plasticity 
bound limit theorems 2:549-551 
boundary integral equations 2:735—740 
composite laminates 2:451~452 
contact mechanics 2:205 
damage coupled models 2:340~341 
error indicators 2:486-487 
geomechanics 2:549-551, 2:552~553 
gradient-enhanced damage model 2:359--360 
shape sensitivity 2:740-743 
see also crystal. ..; elastoplasticity; viscoplasticity 
plates 1:3 
asymptotic expansions 1:201-207, 1:215 
bending methods 1:85 
composite laminates 2:433-437 
coordinates 1:199—200, 1:202 
domains 1:199-200 
eigen frequencies 1:224 
finite elements 2:59, 2:79-83 
four-point bending 2:39-40 
fracture behavior 2:39-40 
with holes 2:326—-329 
Kirchhoff—Love models 1:204, 1:208~-209, 1:229, 2:433-435 
locking 1:221-222 
multiscale expansions 1:202—-211 
shakedown 2:326-329 . 
shear locking 1:222—223 
shell theory 2:84—102 
transverse shear locking 2:122 
see also thin-walled structures 
PLC see Portevin—le Chatelier 
ploughing 2:205 
ply level deformations 2:432, 2:453--458 
PML see perfectly matched layers 
Poincaré—Friedrich’s inequality 1:75 
point... 
connections 1:500-—501 
creation 1:500~501 
estimate method 2:568-569 
insertion 1:508 
moment closures 3:301 
placement 1:508, 1:510 
repositioning 1:510 
visualization techniques 1:528-529, 1:530—531 
vortices 3:131 
pointwise estimates 1:111 
Poisson solvers 3:144~—145 
Poisson thickness locking 2:80, 2:97 
polarization 2:412 
pole-zero constitutive law 2:621, 2:624 


pollution 1:111-112, 2:696, 2:699-700, 2:701-702 
polycrystalline microstructures 2:267~-287 
fluctuation fields 2:281 
heterogeneous 2:281, 2:282-283, 2:286-287 
polygonal patches 1:478, 1:481, 1:482 
polyhedral domains 1:73, 1:77~85, 1:107-109, 2:300-304 
polymer... 
melts 3:487—488, 3:490 
processing 3:481, 3:487-488 
stress 3:491~—492 
polynomials 
algebraic 1:143-145 
approximations 2:700 
Bernstein 2:214 
Bézier 2:213-—214 
bubble functions 3:20 
chaos expansion 2:671-676 
Chebyshev 1:143-144 
expansions 3:63-67, 1:143-145 
Fourier—Legendre 3:215-217 
Hermitian 2:213-214, 2:671-676 
interpolation 1:604-605, 1:125-126 
Jacobi orthogonal 1:143~146 
Legendre 1:393, 2:704, 1:143-144 
spaces 1:285—286, 1:220-221 
spline 2:213 
pore... 
accessibility 2:518-519 
pressures 2:564-—565 
size distributions 2:519 
water 2:561~562 
porous media 2:566—567, 2:584~—586, 2:592-599 
Portevin—le Chatelier (PLC) band propagation 2:346, 2:349 
positional mismatches 1:547 
positive. .. 
coefficients 1:456-457, 1:463-464, 3:349-356 
streamwise invariants (PSI) 3:415, 3:420-421 
symmetric semidefinite initial value problems 1:680, 1:681 
post-first ply failure 2:438 
postbuckling 2:147-149, 2:150 
postprocessing 1:526--527 
potential energy 
buckling 2:140-141, 2:147, 2:162 
material responses 2:422, 2:423 
Stokes equations 1:241 
potential flow methods 3:334—348, 3:580 
potential functions 1:322-323 
powder compaction 1:431-433 
power method 1:567-568, 1:569, 2:159 
power plant noise 3:444 
Prandtl—Batchelor fiows 3:133 
Prandtl—Reuss equations 2:239 
prebuckling 2:147-149 
preconditioning 
conjugate gradients (CG) 1:642-643, 1:147 
domain decomposition 1:6, 1:617-644 
element-by-element 1:566 
hp-coupled BEM/FEM 1:390—393 
incomplete LU factorization 1:564-565, 1:566-567 
least squares coupled BEM/FEM 1:395-396 
linear algebraic solvers 1:555~556, 1:560-—562 
minimum residual method 1:390 
multigrids 3:177 
Neumann—Neumann method 1:618 
nonoverlapping domain decomposition 1:618—619, 1:620, 1:633-644 
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overlapping domain decomposition 1:617-618, 1:619-620, 
1:630--633 
preconditioned conjugate gradients (PCG) 1:642-643, 1:147 
Schwarz theory 1:617 
time-stepping schemes 3:375-376 
predictors 
buckling 2:155 
corrector procedures 1:332—333, 2:153-156 
modified Lax~Wendroff flux 3:414 
prediction operators 1:165 
simulations 3:293 
preintegration 2:75, 2:98—102 
premixed combustion diagrams 3:518 
premixed flames 3:500-502 
preprocessing, visualization 1:526 
prescribed constrained shape methods 2:378-381 
pressure 
acoustics 2:685-686, 2:687, 2:697 
blood fiow 3:528-540 
coefficients 3:101, 3:103, 3:435 
compressible fluid flow 3:122-123 
correction projections 3:68-69 
differences 3:204—205 
distribution 
adaptive wavelets 1:174 
aerodynamic shape optimization 3:380-381 
Mars Lander 3:327, 3:328 
time-stepping schemes 3:373—375, 3:378 
wing inverse design 3:396-397 
drag 3:443 
echo 3:316-317, 3:320-321 
gradient operators 3:583 
gradient projections 3:586 
head 2:565 
hydroelastic-sloshing equations 2:690-691 
interpolation 1:262—266, 3:484 
reflection 3:316-317, 3:320-321 
sensitive paint (PSP) 3:431—433 
stabilization 3:163 
structural-acoustics 2:685—686, 2:687 
volume relations 2:621 
presumed probability density function 3:520-—521 
primal... 
augmented Lagrangian 2:260 
closest-point-projection 2:250-254, 2:262 
elastic shakedown 2:300 
meshes 3:101, 3:102 
partitioning 2:421—422 
Signorini-type interfaces 1:396-399 
primary states 2:141, 2:144-145 
primitives 1:483, 1:484 
principal curvatures 1:212 
principle. .. 
of frame indifference 2:13 
of maximum dissipation 2:237-239, 2:457, 2:466-467 
of Minimum Complementary Potential Energy 2:422 
of virtual work 2:16, 2:469-470 
prismatic elements 1:731 
probability. . . 
density functions 2:567-568, 3:520-521 
distribution functions (PDFs) 2:646 
of failure 2:568 
one homotopy methods 1:665 
probing 1:540, 1:547~548 
progress variables 3:511 


progressive damage modeling 2:451-458 
progressive failure 2:437—-438 
Project SINUS, INRIS Sophia-Antipolis 3:473-477 
projections 
based interpolation 1:732-734 
Chorin scheme 3:168-170 
closest-point-projection 2:228-229, 2:244-246, 2:249-259, 
2:262-263 
convective gradients 3:586 
elliptic 1:692-693 
filters 3:273 
Galerkin 1:207, 1:210 
interpolation 1:732—734 
Krylov 1:556-557 
Lz 1:692-693, 1:142 
like operations 1:623, 1:631 
pressure 3:68-69, 3:586 
Ritz 2:32, 2:48-49 
schemes 3:168—170 
smoothers 3:179 
velocity-correction 3:67-68 
prolongations 1:580-582, 1:587—589, 1:591, 3:219 
proof techniques 1:269—276 
propagation 
blood flow 3:531-533 
instabilities 2:346-349 
Liiders band 2:346, 2:348-349 
Portevin—Le Chatelier band 2:346, 2:349 
shear bands 2:348-349 
speeds 3:250~251 
waves 2:757-758, 3:531-533 
see also crack... 
PSE see particle strength exchange 
pseudo arc lengths 1:666, 2:153, 2:154 
pseudo-overlap reordering, incomplete LU factorization 1:564 
pseudosoftening moduli 2:351 
pseudospectral collocation 3:485 
pseudospectral derivatives 1:142 
PSI see positive streamwise invariants 
PSP see pressure sensitive paint 
PSR see perfectly stirred reactors 
Ptolemais cooling tower 2:533-538 
PU see partitions of unity 
public domain software 1:114 
pulse wave propagation 3:533 
punch compaction 1:431-432, 1:433 
pure mixing 3:508, 3:509-510 


Q-factors 1:652—653 
Q-orders 1:652-653 
Qg-element 2:18-20 
QMR algorithm 1:559-560, 2:707 
QR factorizations 1:555, 1:558 
QR method 1:567, 1:568~571 
quadratic, . . 
kinematics 1:219 
macroscopic free energy 2:272 
strain energy function 2:466-467 
quadrature. . . 
errors 1:179-180, 1:603 
method 1:704, 1:715, 1:717-719 
points 2:113 
rules 3:108-109 


784 Subject Index 


rn rh 


quadrilateral elements 
finite element spaces 1:78, 1:81 
hanging nodes 2:38 
node-regular refinement 2:37—38 
Stokes 3:162—163 
Stokes equations methods 1:266 
thermal diffusion 1:260-261 
quadrilateral hierarchic shape functions 1:121-123 
quadtree-octree based methods 1:499-500 
quality meshing 1:502-510 
quasi... 
brittle fractures 2:350, 2:360-361, 2:363-364 
Eulerian conservation equations 1:419 
geostrophic flows 3:253-254 
interpolation 1:59-60, 1:66-68 
linear exact potential flow equation 3:342-343 
linear model 3:313 
sparsity 1:176-177 
static analysis 1:428, 2:160-164, 3:229--230 
triangular matrices 1:553 
quaternions 2:120 
QZ method 1:571 


r-adaptive technique 1:422 
R-curves 2:396-399 
R-factors 1:652-653 
R-orders 1:652-653 
racing sailboats 3:596—601 
radial... 
basis functions (RBF) 1:300, 1:716 
tetum algorithm (RRA) 2:737-738 
set functions 2:709-710 
terms annihilation 2:705 
radius functions 1:485—490, 1:494 
radius-to-thickness ratio 2:87, 2:89-90, 2:92 
Rafale inlet design 3:428-429 
Rafale store releasing 3:429, 3:430 
rake angles 2:503, 2:504 
random... 
convection 3:315 
eddying 3:318 
fields discretization 2:663—667 
methods 3:293 
variables 2:658-661 
rank parameters 1:611 
Rankine sources 3:580 
Rankine yield criterion 2:535~-536, 2:537 
Rankine—Hugoniot jump condition 1:440, 1:441 
RANS see Reynolds averaged Navier-Stokes; Reynolds averaged 
Numerical Simulations 
rapid terms 3:312-314 
rate dependent elastoplastic deformations 2:251-252, 2:254-255 
rate of plastic work 2:487 
Raviart—-Thomas (RT) elements 1:259--261, 1:264-266, 1:271, 1:395 
ray tracing 1:484, 1:541-542 
Rayleigh quotients 1:567~568 
Rayleigh—Ritz method 1:348 
RBF see radial basis functions 
RBSM see rigid bodies spring model 
reacting flow control 3:499-523 
reaction-diffusion equations 1:696-699, 1:700 
reactors, laminar flames 3:513-514 
real Schur form analysis 1:552 
real Shur decompositions 1:553, 1:570 
reciprocity 1:704, 2:758--763 


recompression hierarchical matrices 1:613-614 
reconstruction schemes 3:354 
recovery estimators 1:93-95, 1:96-97 
rectangular elements 1:78-79, 1:106—107 
rectangular window functions 1:282 
recursive quadratic programming 2:218 
red-black ordering 1:563~564 
Red—Green—Blue refinement 1:102—103 
reduced. .. 
basis techniques 2:317--318 
gradient formulation 3:386-387 
integration 2:122 
reference configuration 2:64~66, 2:76, 2:81 
referential domains 1:414, 1:416, 1:417~418 
referential velocity 1:420 
REFINE/COARSEN 1:98, 1:99 
refined constitutive forming processes 2:481—484 
refinement 
fiow field compression 1:170 
linearized elasticity 2:37-39 
rules 1:87, 1:100 
reflection invariance 3:275 
regional model adaptivity 2:44 
regular... 
data sets 1:529 
meshes 1:502-504 
triangulations 1:98, 1:100—103 
reinforced concrete 2:529-538 
Reissner~Mindlin (RM) model 
discretization 2:110—112 
plates 2:209-210, 1:224, 2:45, 2:110-112 
shell theory 2:86, 2:105-~106, 2:110-112 
relative... 
errors 2:766, 2:767 
permeability 1:725 
permittivity 1:725 
relatively intact (RI) state 2:554 
relaxation 
` elliptic 3:317, 3:318, 3:321 
function 2:753 
potential flow equation 3:344-345 
shakedown 2:315 
transonic potential flow 3:339 
transonic smal{-disturbance 3:340~341 
reliability 
adaptive computation 1:691, 3:197-198 
averaging error estimators 1:94-95 
error control 1:86 
geotechnical engineering 2:568-569 
indices 2:677 
methods 2:676-—680 
residual error estimates 1:88 
remeshing 1:330-331, 2:363 
remodeling 
arterial walls 2:615-616 
heart wall mechanics 2:622 
rendering 1:527--528 
renormalization 3:259-263 
reordering unknowns 1:563-564 
Tepresentation 
formulas 1:355-358, 1:706 
geometric 2:378-394 
theorem of isotropic functions 2:244 
representative unit cells (RUC) 2:443-446 
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representative volume elements (RVE) 
composite laminates 2:432, 2:441—443, 2:445-451 
concrete mechanics 2:514—515, 2:520--521, 2:523 
homogenization methods 2:408, 2:423 
reproducibility 1:286, 1:287-288 
reproducing kernel particle method (RKPM) 1:284, 1:286 
reservoirs 2:692, 3:602, 3:603 
residuals 
approximations 1:187—195 
discontinuous Galerkin 3:93-94, 3:96, 3:103 
of the entropy inequality 1:447—448 
error estimators 
explicit 1:87-89, 1:96, 2:28-29 
h-version BEM/FEM 1:382, 1:383—384 
implicit 1:387—389, 2:29-31, 2:34-35, 2:50-52 
linearized elasticity 2:28-31 
a posteriori 1:86 
free bubbles 3:20-21, 3:35 
generalized minimal 1:558—559, 2:702, 3:565—566, 3:570 
Helmholtz equations 2:700—701 
Navier-Stokes code 3:417 
refinement 3:201-202 
strains 2:611-613 
stress 2:318, 2:613-616, 2:622 
resistance 
crack growth 2:396-399 
ship hydrodynamics 3:579, 3:580 
slip 2:273-274, 2:277 
resolution 
contact 1:318-321, 1:322--323 
direct numerical simulations 3:279-281 
large eddy simulations 3:281 
multiresolution 1:163—164, 1:480-481, 1:543 
rules 2:699-700 
switched schemes 3;351 
response 
functions 2:204 
nonlinear aeroelasticity 3:459-460 
statistics 2:669-671, 2:675-676 
restrained blocks 2:293-294, 2:321—-323 
restriction 
multigrid methods 1:580, 1:587 
operators 3:219 
results visualization 1:526 
retaining wall structures 2:558—561 
retarded potentials 1:706, 1:707, 1:711, 1:712, 1:713 
return mapping 2:244-250, 2:219, 2:467-468 
Reuss fields 2:409, 2:411~412 
reverse transforms 1:172-174 
Reynolds averaged Navier-Stokes (RANS) equations 
computational aerodynamics development 3:329 
Dassault Aviation solver 3:423—426 
multigrid time-stepping 3:374 
ship hydrodynamics 3:580, 3:593 
turbulence closure 3:301-303, 3:305, 3:318-322 
turbulent flames 3:515-517, 3:520 
Reynolds averaged Numerical Simulations (RANS) 3:270 
Reynolds averaged turbulence modeling 3:318-322, 3:418—423 
Reynolds averaged velocity 3:301 
Reynolds numbers 3:2, 3:184, 3:210-213 
Reynolds stress 
Second Moment Closure 3:315, 3:316 
ship hydrodynamics 3:593--594 
tensors 3:274, 3:302, 3:303 
transport 3:320 
turbulence closure 3:302-303, 3:315, 3:316, 3:319-322 


Reynolds transport theorem 1:419 
theoelectric analogic computers 3:409 
theology 3:481, 3:487—488, 3:489 
RI see relatively intact 
rib scales 2:529-530 
ribbons 1:536 
Riccati equation 1:615 
Richardson extrapolation 1:11~12 
Richardson iteration 1:555-556, 1:580 
Riemann solvers 3:97, 3:98~100 
Riemannian metrics 1:509-510, 2:8-9 
Riesz basis 1:162-163, 1:166-167 
right-preconditioning 1:561 
rigid... 
bodies 
collisions 1:314~335 
fluid interactions 1:424, 1:425-426 
nonlinear structural dynamics 2:185-186, 2:187 
search strategies 2:216 
spring model (RBSM) 1:333-334 
cube within water 3:604-607 
models 2:495-496 
motionless cavities 2:691 
ships 3:601, 3:602-605 
Rioja de España 3:596-601 
Ritz... 
Galerkin methods 1:73, 1:74-77, 1:556, 1:557-558 
projections 2:32, 2:48-49 
values 1:553, 1:561, 1:572 
vectors 1:553, 1:572 
Rk-matrices 1:609-610 
RKDG see Runge-Kutta discontinuous Galerkin 
RKPM see reproducing kernel particle method 
RM see Reissner—Mindlin 
RMD approximation 1:284 
robustness 1:132—133 
tock mechanics 2:543-569 
rods 2:757 
Roe flux 1:469 
rolling ball fillets 1:133 
rotation 
invariance 1:733-734, 3:275 
magnitude 2:85, 2:87-91 
oscillations 1:425-426 
parameterization 2:77, 2:116—120 
propellers 3:571, 3:572 
representative volume element 2:441 
rotated differencing 3:342-343 
rotated Stokes elements 3:162 
shallow water equations 3:251 
tensors 2:77, 2:117-120 
turbulence closure 3:314, 3:320 
vectors 2:118-119 
water mills 3:603 
Rothe Method 3:166-167 
rough inverse hierarchical matrices 1:615 
roundoff errors 1:656—657 
row-orientation 1:600—606 
RRA see radial return algorithm 
RST equations 3:311, 3:317-318 
RT see Raviart- Thomas 
RUC see representative unit cells 
Runge-Kutta 
discontinuous Galerkin method (RKDG) 3:97, 3:104—-117 
explicit time-stepping 3:367-368, 3:372 
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Runge-Kutta (continued) 
time integration 2:176, 2:178 
visualization algorithms 1:534—535 
RVE see representative volume elements 


S-A see Spalart—Allmaras model 
Si Modane parafoil model 3:450~451 
S1 Modane store release testing 3:429 
saddle points 1:246-257, 1:182-183 
safety assessments 2:291-331 
safety factors 2:305, 2:551, 2:568 
St. Venant—Kirchhoff—type material law 2:66, 2:84 
sampling 2:417-421 
sandstone soil consolidation 2:583-—584 
saturation 2:31 
assumption 1:91-93, 1:103 
cement pastes 2:517-520 
consolidation 2:584—589 
edge 1:508 
geomechanics 2:553-554 
materials 2:553-554, 2:566-567 
soils 2:589—592 
scalar 
advective-diffusive equations 3:32—37 
conservation laws 1:439—450, 3:97-98, 3:100-—108, 3:349-351 
fields 1:530, 1:531-533, 2:741 
hyperbolic equations 1:28-36 
parameters 1:670~672 
turbulence closure 3:303—311, 3:320 
variable models 3:303-311 
scale effects 2:360-362 
scale separations 
discrete Navier-Stokes equations 3:219-228 
dynamic multilevel methods 3:207-264 
Navier-Stokes equations 3:213-214, 3:219-228 
renormalization theory 3:260-—263 
shallow water equations 3:248—259 
turbulence 3:40, 3:45~46 
scaled. .. $ 
boundary element method 2:547 
moving least squares 1:288-289 
norms 3:121-122 
scales of observation 2:513-539 
scaling updates 2:278~279 
scalp 1:520-521 
scanning-electron microscopy (SEM) 2:518~520 
Schmid stress 2:272-273, 2:276-277, 2:279 
Schur complement 
adaptive wavelets 1:186 
eigenvalues 1:570 
error indicator 1:384—387 
incompressible viscous flows 3:177 
preconditioning 1:618, 1:635~-636 
Schur matrix forms 1:553, 1:569, 1:570 
Schwarz theory 1:617-644 
scientific visualization 1:525-526 
SCIRun development environment 1:545 
Scott—Zhang operators 1:60, 1:66, 1:67 
scripts 1:492 
SD-DF see streamline diffusion discontinuous Galerkin 
seabed geomechanics 2:555-558, 2:559 
search costs 3:389-390 
search procedures 1:317—321, 1:535, 2:196, 2:214-216 


second... 
kind integral equations 1:594 
law of thermodynamics 2:12~13 
moment closure (SMC) 3:311-318, 3:320 
moment transport 3:311-318 
type tetrahedral elements 1:728 
second-order. . . 
elliptic boundary values 1:107-111 
ellipticity 3:91, 3:115-120 
hyperbolic equations 1:28, 1:30-31 
linear transmission 1:394-396 
monotone schemes 3:420 
predictors 2:155 
problems 1:83-85 
reliability method 2:678 
Sedov-type explosions 3:115, 3:116 
seepage 2:565 
seismic behavior 2:592, 2:593, 2:594 
self contact 2:216 
Self-Consistent method 2:412 
SEM see scanning-electron microscopy; spectral element method 
semi-implicit time integration 3:245-246, 3:252, 3:254-256 
semi-Lagrangian time integration 3:69 
semidiscrete. . . 
finite elements 3:565—566 
finite volume methods 1:443 
multiscale formulation 3:54—55 
space hierarchies 1:218 
semidiscretization 3:462—464 
seminorms 1:74-75 
semipolynomial spaces 1:204 
semisubmerged rotating water mills 3:603 
SEMMT 3:554 
sensitivity 
buckling 2:149-150 
elastic deformation 2:275-276 
imperfection 2:149-150, 2:349 
parameterized nonlinear equations 1:667-669 
spherical shells 1:224—226, 1:229 
stochastic finite elements 2:667-668 
separations 
cork of separation 2:350-351 
flow 3:320 
separability-of-scales 2:514 
shallow water equations 3:248—251 
see also scale... 
sequential quadratic programming 2:216, 2:218-219 
Serendipity elements 1:78, 1:81 
Serial Staggered Procedure 3:473 
seven-parameter shell model 2:84, 2:90, 2:97-98 
SGBEM see symmetric Galerkin boundary element method 
shadows 1:541 5 
shakedown 
algorithms 2:312-321 
alternating plasticity 2:304 
applications 2:321-330 
classical 2:296-299 
elastic simulations 2:319 
extremum principles 2:299-301 
hardening 2:307~310 
incremental collapse 2:304-307 
kinematics 2:294-302, 2:305, 2:318-319 
loading-unloading conditions 2:293—294, 2:296-297, 2:300~304, 
2:321-326 
numerical procedures 2:316-—320 


perforated plates 2:326~329 
plastic 2:304 
plates with holes 2:326-329 
safety assessments 2:291-331 
square plates 2:326--327 
temperature 2:307—310 
thermomechanical loadings 2:293~-294, 2:321-323 
tubes 2:323-326 
shallow, . . 
foundations 2:555-558, 2:559 
shells 1:215-217 
water equations 3:240-259 
shape... 
functions 
continuous discretization 2:664 
degenerating solid elements 2:79-80 
derivative evaluation 1:283-291 
Maxwell equations 1:732-733 
mixed finite element methods 1:238 
Modifications 1:293 
one dimensional hierarchical 1:120—121 
partition-of-unity 2:365--369 
optimization 3:379—400, 3:437-443 
regular elements 1:57—-58, 1:68 
sensitivity analysis 2:740-743 
sharp cutoff filters 3:273-274 
shear 
angles 2:503, 2:505 
bands 
cohesive-zone 2:351-355 
Liiders propagation 2:348-349 
meshfree methods 
Portevin—Le Chatelier propagation 2:349 
rake angles 2:503-504 
correction factors 1:209-210, 2:97 
crystal plasticity 2:283-284 
energy 1:209 
flows 3:294 
geomechanics 2:557, 2:558 
layer instability 3:123 
locking 1:222-223 
material responses 2:418 
moduli 2:409, 2:418, 2:419 


slip mesh update method (SSMUM) 3:555, 3:571, 3:572 


stiffness 2:339 
strain 1:206, 2:536-537 
stress 
arbitrary Lagrangian~Eulerian 1:433 
blood flow 3:535, 3:536-537 
concrete mechanics 2:536—537 
transport (SST) 3:308-—309, 3:311, 3:320 
turbulence closure 3:310, 3:311 
shedding frequencies 3:321 
sheet metal stamping 1:518, 1:519 
shells 1:3, 1:218-219 
asymptotic expansions 1:201-202, 1:211-218 
coordinates 1:199-200 
director definition 2:113-116 
domains 1:199-200 
eigen frequencies 1:224—229 
finite elements 2:59, 2:79-83, 2:104--128 
hierarchical models 1:223-224 
higher-order models 2:106-107 
intersections 1:133 
Kirchhoff—Love models 1:211-218, 2:103-105 
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layer-wise models 2:106-107 
limiting models 1:211-218 
locking 1:221-223 
membrane theory 2:102—103 
multiscale expansions 1:211~218 
Reissner—Mindlin models 2:105-106 
shifters 2:73-74, 2:94-95, 2:98-~102 
structural behavior 2:68~-70 
theory derivation 2:72—102 
see also thin-walled structures 
shift strategies 1:570 
Shift Theorem 1:76-77 
shifter tensors 2:73-74, 2:94-95, 2:98-102 
ship hydrodynamics 3:579-607 
characteristic length parameters 3:592-593 
finite calculus 3:581, 3:583-587 
finite element discretization 3:587-589 
fluid dynamics 3:3 
fluid-ship interactions 3:589-590 
Lagrangian fluid flow 3:581, 3:591-592 
mesh node updating 3:590-591 
Navier-Stokes equations 3:581-583 
ship motion 3:589-590 
transom stern flow 3:591 
turbulence 3:593-594 
viscosity 3:592 
shock 
capturing 
compressible flows 3:551-553 
computational aerodynamics 3:329, 3:460 
discontinuous Galerkin methods 3:97, 3:98—104 
Euler equations 3:348-359 
fluid flow discretization 3:363--364 
Navier-Stokes equations 3:348—359 
nonoscillatory 3:348, 3:349-356 
RKDG 3:111-112 
complex geometry 3:346-348 
direct numerical simulations 3:281, 3:294 
free flows 3:346 
large eddy simulations 3:281, 3:295 
point difference equations 3:341, 3:342 
RKDG 3:111-112 
waves 3:460 
Shortley—Weller approximation 1:13 
shotcrete 2:514, 2:516—-529 
shrinkage strains 2:520-521 
shrouds 3:5-6 
side modes 1:122, 1:123 
side-grooved specimens 2:398 
sign conditions 3:107 
significant coefficient predictions 1:189, 1:191-193 
Signorini-type interfaces 1:376—377, 1:396—403 
silicium carbide-carbon 2:345, 2:351 
Silver—Miiller condition 1:735 
simple mechanism of incremental collapse (SMIC) 2:304—-307, 2:323, 
2:325 
simulation 
crack growth 2:377 
flow 3:570~574 
implosion 3:115, 3:117 
platforms 3:473—477 
see also direct numerical. . .; large eddy.. .; numerical... 
single. .. 
column supports 2:44—46 
crystal plasticity 2:274—280 
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single. . . (continued) 
edged notched beams 2:352, 2:354--355, 2:363 
field time-discontinuous Galerkin methods 2:176 
layer potentials 1:599 
phase nonreactive fluids 3:269-296 
step chemical reactions 3:510 
singular integrals 2:725 
singular value decompositions (SVD) 1:555 
singularity prevention 3:305 
six-parameter shell model 2:84, 2:87-88, 2:97 
size effects 2:360-362, 2:363 
skeletons 1:536 
skew-symmetric convection 1:149 
skin friction 3:307, 3:315, 3:321-322 
slave nodes 2:212-213 
sliding 2:197, 2:199-202, 2:206, 2:210-214, 2:220 
SLIP see symmetric limited positive 
slip 
concrete mechanics 2:531 
contact mechanics 2:201-202, 2:205, 2:206 
resistance 2:273-274 
system updating 2:278—280 
updating 2:278-280 
Slobodetskii norms 1:362 
slope limiters 
discontinuous Galerkin 3:97, 3:98- 100, 3:104 
multidimensional finite volumes 1:459—460 
RKDG 3:107—108, 3:109 
sloshing modes 2:691 
slow terms 3:312, 3:313 
Smagorinsky model 
eddy viscosity 3:40-41, 3:43~44 
large eddy simulations 3:424-425 
ship hydrodynamics 3:598—599 
subgrid-scale modeling 3:285-286 
viscosity 3:241 
small... 
eddies 3:259--263 
scale decompositions 3:219~228 
scale separations 3:248-251, 3:260-263 
strains 2:208-210, 2:467—468, 2:647-649 
SMC see second moment closure 
smeared cracks 2:394—399 
smeared format 2:351 
SMIC see simple mechanism of incremental collapse 
smooth 
average fields 3:301 
filters 3:273-274 


particle hydrodynamics (SPH) 1:280, 1:281-285, 1:291 


solutions 1:129 
surfaces 2:213-214 
variational multiscale method 3:13-15 
smoothing effect 1:581-582 
smoothing iterations 1:580 
snap-back points 2:149 
snap-buckling 2:148-149, 2:159-164 
snap-through 2:70, 2:148-149, 2:159-164 
snapping 2:148-149, 2:159-164 
Soboley 3:389 
gradient 3:389 
index 1:366-371 
inner products 3:388 
scale 1:177 
spaces 
adaptive wavelets 1:166, 1:188 
boundary integral equations 1:710-712 


convergence optimality 1:362~363 
finite element methods 1:104 
linear elliptic boundary values 1:74-75 
symmetric coupled BEM/FEM 1:378 
Variational Alternating Schwarz Method 1:620 
soft biological tissue 2:605~629 
soft limiters 3:353 
softening 
concrete mechanics 2:531-532, 2:536 
geomechanics 2:553, 2:555-556 
nonlinear 2:451-—452 
software 
architecture 1:475-476 
cardiovascular surgery 3:538~539 
finite element methods 1:98, 1:113-114 
parabolic differential equations 1:702 
shells 1:219 
soil 
behavior 2:577-589 
consolidation 2:561-567 
dynamics 2:589-592 
mechanics 2:543-~569 
skeletons 2:561-562, 2:564-565 
structure interactions 2558-561 
solid. . . 
boundaries 3:145-149 
extension mesh moving techniques (SEMMT) 3:554 
textures 1:491-492 
wall boundary conditions 3:291- 292 
solids 
constitutive models 2:1-3 
deformability 1:327 
elastoplastic deformations 2:227-264 
instability 2:1-3 
introductory survey 2:1-3 
mechanics 1:426, 1:428-433, 1:497-521 
multifield problems 2:1-3 
multiscale modeling 2:1-3 
nonlinearity 2:1-3 
processing 2:1-3 
structures 2:1-1-38 
viscoplastic deformations 2:227-264 
within water 3:604—607 
solution interpolation 1:516 
solution regularity 3:157-159 
solvability 1:246-247 
solvers 1:4-5, 3:120, 3:408-412 
Sommerfeld radiation condition 2:702, 2:708 
sound pressure 3:6-7 
SOUPLE 3:437-443 
space splitting 1:623-633 
space vehicles 3:412, 3:433-435 
space-periodic boundary conditions 3:209 
space-time 
averages computability 3:199-201 
boundary integral equations 1:703-713 
contact technique (STCT) 3:557-558 
discrete maximum principle 1:445-446 
finite elements 2:171~173, 3:27~28, 3:565~-566 


Galerkin finite elements 1:676, 1:684—686, 1:689-690 


incompressible Navier-Stokes equation 3:44-45 
multiscale method 3:27-32 

stabilized 3:549, 3:555~557, 3:571~574 
structural dynamics 2:171-173 

variational multiscale method 3:27-32 

wave equation 1:706—707, 1:711-713 


Spalart-Allmaras model (S-A) 3:319, 3:320 
span-load distributions 3:380 
sparsity 


adaptive wavelets 1:158—159, 1:175-181 
approximations 1:158-159 

matrices 1:555 

patterns 1:555 


spatial, .. 


discretization 
averaging step functions 2:664 
Euler codes 3:413-414 
incompressible flows 3:69, 3:160-166 
incompressible Navier-Stokes equations 3:160-166 
RKDG 3:105-106 
shallow water equations 3:242-—246 
thin-walled structures 2:71-72 
viscoelastic fluid fiows 3:484-485, 3:494 
viscous flows 3:160-166 

domains 1:414, 1:416, 1:417-418 

location 1:540 


occupancy enumeration 1:476-477, 1:484~-485, 1:532, 1:533 


periodic boundary conditions 3:209 

periodic flows 3:210-212, 3:555-557, 3:573 
representation 1:528-530 

search strategies 2:215 

semidiscretization 2:71~72 

tangent modulus 2:468—469 

turbulent flows 3:225-228 


species equation 3:505, 3:514 
specific. .. 


entropy 2:12 
strain-energy functions 2:13, 2:14-16 
time integration 3:252-259 


spectral. . . 


approximation 3:220-221, 3:242-245, 3:248-259 
collocation fiows 3:485 
convergence 1:142 
eddy viscosity 3:50 
element method (SEM) 3:1, 3:81-85, 1:150~-152 
equivalence estimates 1:627-630 
hp elements 3:61-88 
methods 1:3, 1:141-154 
advection equations 1:148-150 
algebraic expansions 1:143—146 
conservation laws 1:148-150 
Fourier methods 1:141-142 
Jacobi orthogonal polynomials 1:143-146 
mortar method 1:152-154 
Navier-Stokes equations 1:146 148 
orthogonal polynomials 1:143~146 
polynomial expansions 1:143-145 
spectral elements 1:150-152 
Stokes equations 1:146-148 
trigonometric expansions 1:141-142 
radius 1:332-333 
random fields 2:665—-667 
stochastic finite elements 2:671~676 
viscosity 1:150 


SPH see smooth particle hydrodynamics 
spheres falling in liquid filled tubes 3:571--573 
spherical. . . 


Hankel functions 2:704--706 
harmonics 2:704 
window functions 1:282 


spiral bevel gears 2:388-390 
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spline 
polynomials 2:213 
surfaces 1:479—480 
wavelets 1:480-481 
window functions 1:281, 1:283-284 
split stress-updating 1:429-431 
splitter plates 3:425-426 
splitting 1:263-264, 3:168-170 
springs 2:668-669 
SQP see successive quadratic programming 
square cylinder drag 3:191-192, 3:193-194, 3:197-198 
square plates 2:326-327 
SRVE see statistically representative volume elements 
SSMUM see shear slip mesh update method 
SSP see strong stability preserving 
SSP-RK see strong stability preserving Runge—Kutta 
SST see shear stress transport 
stability 
advective-diffusive equations 3:33-36 
buckling 2:142-144, 2:145-149, 2:150-164 
conservation laws 1:445~446, 1:467-468 
damage mechanics 2:342-344 
discontinuous Galerkin methods 3:92—93, 3:95-97, 3:106-107, 
3:119 
elastic-plastic crystals 2:280 
envelopes 3:489 
factors 
adaptive computation 1:675-679, 1:683, 1:686-689, 1:691-693 
laminar flow 3:204, 3:205 
parabolic differential equations 1:675—679, 1:683, 1:686—689, 
1:691-693 
Galerkin boundary elements methods 1:365-366 
geomechanics 2:551 
geotechnical engineering 2:567-569 
heterogeneous microstructures 2:282—283 
least squares constitutive equations 2:645-646 
mixed finite element methods 1:246-257 
nonlinear aeroelasticity 3:459-—460 
parameter identification 2:641 
polycrystallines 2:282~283 
preservation 1:21-22 
shallow water equations 3:244--245, 3:246 
Sobolev index 1:367—368 
Stokes equations 1:266—268 
thin-walled structures 2:126 
time integration 2:178-179, 2:188-189 
weights 1:677 
stabilization 
adaptive wavelets 1:159-161 
arbitrary Lagrangian--Enulerian 1:414 
compressible flows 3:551~553 
fluid dynamics 3:548-550, 3:551-553 
incompressible flows 3:549-550 
stabilized 
integrals 3:585 
Maxwell equations 1:727 
methods 3:1 
advective-diffusive equations 3:32-40 
concept 3:22-26 
Dirichlet-to-Neumann formulation 3:8-11 
Galerkin 3:185 
multiscale methods 3:5-55 
space-time formulations 3:30-32, 3:549, 3:555—557, 3:571-574 
stable flames 3:500-501, 3:503--504 
staggered meshes 3:223-225 
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staggered partition solution procedures 2:580-582, 2:583 
stagnation enthalpy 3:358 
standard. . . 
staggered partitioning 2:580-582 
stress updating 2:276-277 
time step control 1:699, 1:701 
wavelet representation 1:168 
Stanton number isolines 3:434 
starting conditions 2:180-181 
State. ., 
equations 3:417-418, 3:439, 3:506~507 
functions 2:12 
variables 2:270-271 
Static... 
condensation 1:263, 2:96 
constraint 2:339-340 
determinate structures 2:661-663 
equilibrium states 2:140 
finite elements 2:126, 2:303-304 
instabilities 2;341-346 
linear elasticity 1:622—623 
pressure 2:685, 2:687, 2:690 
response 2:691 
stationary. . . 
discrete shocks analysis 3:357-358 
heat conduction 1:622 
problems 1:158, 3:201~205 
statistical. .. 
closure 3:301 
linear clastic continua 2:670-671, 2:675~—676 
modeling 3:519, 3:520~521 
moments 2:670-671 
noise modeling 3:444—445, 3:446-448 


statistically representative volume elements (SRVE) 2:423 


STCT see space-time contact technique 
steady... 

advective-diffusive equations 3:32-36 

flows 3:358-359, 3:481, 3:482-488 

one-dimensional tools 3:514 

processes 1:428 

state 3:459 

viscoelastic fluid flow 3:481, 3:482—488 
stealth aircraft 3:430~431 
steel 

austenitic 2:649, 2:650 

axisymmetric necking 2:651-652, 2:653—654 

concrete interaction 2:513-514, 2:529-538 

pellet impacts 2:379 

sheets 2:474-476 

see also metals 
steepest edge active set scheme 2:550 
Steger-Warming flux vector splitting 1:469-470 
Steklov—Poincaré operators 1:380—382, 1:384~-386, 1:399 
step function discretizations 2:664 
step relaxation 2:315 
stepping procedures 2:151-156 
stick 2:199-202, 2:204, 2:205, 2:220 
stiff initial value problems 1:680—686, 1:687 
stiffness 

backfill soil 2:558-561 

degradation 2:553 

equation inversion 2:671—673 

matrices 

cohesive-zone models 2:353 
concrete mechanics 2:530 


discontinuous deformations 1:328, 1:332 
eigenvalues 1:571 
elastoplasticity 2:745 
hierarchical 1:611—612 
MATLAB 1:113-114 
panel clustering 1:611-612 
soil skeletons 2:565 
thin-walled structures 2:125 
reduction schemes 2:453, 2:454-456 


stochastic 


constitutive equations 2:644—645, 2:646-647, 3:490-491, 
3:493-494 
finite element methods 2:657—680 
Karhunen—Loeve expansion 2:665—667, 2:671-676 
perturbation 2:668-671 
random fields representation 2:663-667 
random variables 2:658~661 
teliability methods 2:676—680 
spectral formulations 2:665-667, 2:671-676 
statically determinate structures 2:661-663 
variable sensitivity 2:667-668 
geomechanics 2:567--569 
optimization 2:644—645 
reacting flow stoichiometry 3:507-508 
response 2:673—675 


stoichiometry, reacting flows 3:507~508 
Stokes elements 1:182—183, 3:161-163, 3:164 
Stokes equations 


inf—condition 1:269-276 

mixed finite element methods 1:240—241, 1:262-268 
saddle-point stability 1:249-251 

spectral methods 1:146-148 

see also Navier-Stokes equations 


stopping criterion 1:86 

store releasing, military aircraft 3:429, 3:430 
stored energy 2:230~231, 2:234 

strain 


arterial walls 2:611 
bounds 2:311 
composite laminates 2:441 —442, 2:446, 2:455, 2:457—-458 
concrete mechanics 2:520~524, 2:526—529, 2:536 
constitutive equations 2:647-654 
contact discretizations 2:208~214 
coupled damage-plasticity 2:340-341 
degenerating solid elements 2:81~83 
direct measures 2:77-78 
elasticity 1:245-246, 2:9-10 
elastoplasticity 2:235-237 
energy 
clamped hemispherical shells 2:69-70 
compressible materials 2:13-16 
elasticity constitutive tensors 2:13 
hyperelastic 2466-467 
isotropic compressible materials 2714-16 
principle of frame indifference 2:13 
shell theory 2:87--89 
through-the-thickness integration 2:94 
enhanced 1:245-—246, 1:267—268, 2:93 
fields 1:326-329, 2:399-400, 2:457-458 
geomechanics 2:553, 2:555-558, 2:559, 2:566 
hoop 2:87-89 
incompressible elasticity 1:244-246 
interior points 2:735--736 
kinematics 2:9-10 
localization 2:521-522, 2:553 


ost 


logarithmic measures 2:464—465 
material responses 2:410 
measures 2:66-67, 2:77-78, 2:464—465 
plates 1:200, 1:206 
rate 2:506, 3:309-310 
representation 2:722-723 
retum maps 2:467-468 
sheli theory 2:87-93, 2:95-98 
softening 2:342, 2:344, 2:345-346, 2:347 
soil consolidation 2:566 
Stokes equations 1:267—268 
tensors 
Aimansi 2:9-10, 1:136 
boundary integral equations 2:720 
clamped elliptic shells 1:213, 1:214 
concrete mechanics 2:522-—524 
continuum damage mechanics 2:455 
degenerating solid elements 2:82 
effective. .. 2:455 
elastic body deformations 2;9-10 
Euler 1:136 
forming process modeling 2:463-464 
Green—Lagrangian 2:66-67, 2:73—-74, 2:91, 2:441-442 
Hencky 2:9 
p-finite clement method 1:136 
shells 1:216, 2:91-93, 2:94-95 
statistical moments 2:670—671 
visualization algorithms 1:537-538 
thin-walled structures 2:66-67 
three-dimensional continuum 2:74 
transverse shear locking 2:122~-123 
see also finite... 


Strang lemma 1:144 
streaklines 1:535-536 
stream ribbons 1:536 
stream surfaces 1:536 
streaming data 1:543 
streamline diffusion discontinuous Galerkin (SD-DF) 1:449-450, 3:97 
streamline-upwind/Petrov Galerkin (SUPG) method 


advective-diffusive equations 3:34--35, 3:37 
compressible fiows 3:551-553 

ship hydrodynamics 3:583, 3:592 

stabilized methods 3:22-23 

viscoelastic fluid flows 3:482—-484, 3:489-490, 3:495 


streamlines 1:536 


curvatures 3:320-321 
diffusion 3:187, 3:188-189 
visualization algorithms 1:535-536, 1:538, 1:539 


stress 


arterial walls 2:611 

collocation 2:727 

composite laminates 2:432, 2:439—446, 2:454, 2:456 
concentrations 2:432, 2:439 

concrete mechanics 2:522-531, 2:535-537 
elasticity 1:244-246, 2:10 

elastoplasticity 2:230-231, 2:235-237, 2:238-239 
fields 1:327-328 

incompressible elasticity 1:244-246 

integration 2:467-—468, 2:493-495 

intensity factors 2:734 

interior points 2:735-736 

low-order elements 2:477-478 

material responses 2:410 

plates 1:200, 1:205-206 

power 2:230-231 
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principle 2:561 
representation 2:722-723 
resultants 2:75, 2:78-79 
scaling 2:315 
shells 1:200, 2:75, 2:78-79, 2:95-98 
state 2:238-239 
strain 1:377, 2:336, 2:555-559 
Stress Check software 1:219 
tensors 
acoustic field equations 2:696 
Biot 2:10 
boundary integral equations 2:720 
Cauchy 1:418, 2:66, 2:456, 1:136 
concrete mechanics 2:522-524 
constitutive equations 2:650-651 
continuum damage mechanics 2:454 
cross 3:274 
damage mechanics 2:336 
effective 2:454 
elasticity balance equations 2:10 
elastodynamics 2:767 
Kirchhoff 2:10, 2:14, 2:230-231 
Piola—Kirchhoff 2:10, 2:66, 2:441—442, 2:444, 1:136 
statistical moments 2:670-671 
thin-walled structures 2:66-67 
three-dimensional continuum 2:73-74 
turbulence closure 3:311 
visualization algorithms 1:537-538 
turbulence closure 3:310-311 
updating 1:428, 1:429-431, 2:274~280 
stretched flames 3:512-513 
stretching 1:202, 2:80, 2:491, 3:130-131 
strings 1:9—10 
strip stretching 2:491 
strong... 
boundary value problems 2:16-17 
discontinuity kinematics 2:351-355 
stability factors 1:675-679, 1:683, 1:686-689, 1:691-693 
stability preserving Runge-Kutta (SSP-RK) 3:104—105, 3:106~107 
stability preserving (SSP) time integration 1:464—-465 
Strouhal number 3:84—85 
structural. . . 
acoustics 2:683, 2:684—-689 
dynamics 2:169-189 
acceleration equations 2:173 
elastodynamic equations 2:170-171 
equation formulation 2:170-174 
momentum equations 2:173-174 
multibody equations 2:174 
nonlinear 2:184—187 
ordinary differential equations 2:170-189 
space-time equations 2:171—173 
time integration 2:175~189 
transient analysis 2:169-189 
elasticity 2:40-42 
materjal responses 2:407—427 
mechanics 2:40-42 
scales 2:530-538 
stiffness 2:70 
subgrid-scale modeling 3:284, 3:287, 3:290-291 
thin-walled structures 2:70-107 
structure 
arterial wall 2:606-607 
function subgrid-scale modeling 3:286 
heart wall 2:618~619 


792 Subject Index 


structure (continued) 
inelastic material constitutive equations 2:639 
ligaments 2:625-626 
meshes 1:456—457, 3:359-360 
soil 2:554-555 
subcycling methods 2:188 
subdomain solvers 1:636--638 
subfilter scales 3:284-290 
subgrid-scale models 
direct numerical simulations 3:184—185, 3:198-—199 
generalized Galerkin 3:184-185, 3:198-199 
large eddy simulations 3:184—185, 3:198-199, 3:270-271 
3:272-274, 3:423-424 
ship hydrodynamics method 3:583 
Smagorinsky eddy viscosity 3:43 
space-time formulations 3:27-32 
stabilized methods 3:22-26 
turbulence 3:46-47, 3:284-291 
subgrid stress 3:32-33 
subparametric mapping 1:58 
subsonic 
flows 3:101~104, 3:334-337, 3:425-426 
jets 3:446-448 
linearized potential flow 3:334-337 
mixing enhancement 3:425--426 
points 3:342~343 
subspaces 
bilinear forms 1:623~633 
interaction lemma 1:629-630 
iteration 1:590 
sequences 1:207 
subspatial errors 2:424—425 
substructuring 2:686-689 
subtraction 1:611 
successive quadratic programming (SQP) methods 2:218-219 
sufficient conditions 2:297-299, 2:306--307 
supercomputing 3:411-412 
superconvergence 1:112-113, 2:32-33 
supercritical airfoils 3:460 
superdissipative operators 3:241 
superquadratics 1:320, 1:321 
supersonic 
flight 3:390 
flows 3:101, 3:102—104, 3:425-426 
jets 3:446-448 
mixing enhancement 3:425—426 
points 3:342—343 
zones 3:344—345 
SUPG see streamline upwind Petrov-Galerkin 
support stencil reconstructions 1:462—463 
surface. . . 
currents 1:726 
curvature 3:320 
domains 1:499 
fatigue wear 2:206 
meshing 
adaptivity 1:514-516 
advancing-front 1:501 
Delaunay-type 1:502 
quadtree-octree 1:500 
unit meshing 1:504--507 
mounted cube drag 3:192-195, 3:196, 3:197~198 
patches 1:477—481 
triangulation 1:599 
SVD see singular value decompositions 


> 


swept wings 3:346-347 
swirl 3:320-321 
switch functions 3:553 
switch procedures 2:156, 3:351 
symmetric. .. 
advective-diffusive equations 3:38—40 
coupled FEM/BEM 1:376-389, 1:403-405 
Galerkin boundary element method (SGBEM) 
elasticity 2;727-729 
elastoplasticity 2:727-729, 2:738-740, 2:744 
fracture mechanics 2:733--734 
plastic-elastic analysis 2:727--729, 2:738-740 
Gauss-Seidel implicit time-stepping 3:370~372 
initial value problems 1:680, 1:681 
limited positive (SLIP) scheme 3:351—353, 3:363 
linear elliptic boundary values 1:74-77 
matrix reduced model 2:687-689, 2:691-692 
Navier-Stokes equations 3:416-418 
positive definite matrices 1:555 
reduced models 2:686—689, 2:691~-692 
symmetry 
plates 1:202 
Second Moment Closure 3:315 
System—Lax~Friedrichs flux 1:470 


tangents 
matrices 2:219 
moduli tensors 2:14 
stiffness operators 2:337—338, 2:339 
tangential. . . 
contact stresses 2:204—206 
frictional contact 2:472—473 
sliding 2:199 
velocity 2:198 
tank ships 3:604-605 
Tartar’s theorem 1:447 
taxonomy 1:543-546, 2:376--377 
Taylor microscale 3:211 
Taylor vortex 3:71-72 
Taylor—Galerkin techniques 3:583 
temperature 
aeroacoustics 3:445-446, 3:447 
aircraft cabin air conditioning 3:453 
concrete under fire 2:598 
hypersonic flows 3:433-434 
shakedown 2:307-310 
temporal discretizations 1:331-333 
tensile 
bars 2:345--346, 2:347 
failure 2:438-439 
loadings 2:437-—438 
strength 2:532-533 
tension 
bars 2:344-345, 2:346-349, 2:351 
crystal strip necking 2:284 
stiffening 2:529-538 
tensors 
axes 1:538 
Cauchy~Green 2:9-10 
constitutive 2:13-14 
curvature 1:211-212 
damage 2:453-454 
diffusivity 3:139-140 
effective 2:449-451, 2:454, 2:455 
elasticity 2:414—-416, 2:449-451, 2:455-456, 2:522-524 


b 
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fields 1:530, 1:537~538, 2:741 
Finger 3:492 
gradient deformation 2:8-9 
integrity 2:454—456 
Leonard 3:274 
Lighthill turbulence 3:6 
material 2:66-67, 2:449-451, 2:455—456 
metric 1:211-212 
midsurface curvature 1;211-212 
orientation 3:491—492 
products 
adaptive wavelets 1:166 
elements 1:78 
hp-version BEM/FEM 1:393 
spaces 1:121-124 
warped 1:145-146 
window functions 1:282 
Reynolds 3:274, 3:302, 3:303 
rotation 2:77, 2:117-120 
shifter 2:73-74, 2:94-95, 2:98-102 
tangent moduli 2:14 
viscous 3:505 
vorticity correlation 3:210-211 
see also strain. ..; stress... 
Terzaghi’s theory 2:562, 2:563 
test functions 1:89, 1:119 
testing procedures 2:409-411, 2:417—-421 
tetrahedral elements 
error estimates 1:84 
interpolations 2:214 
Maxwell equations 1:728, 1:729—-731 
mesh discretization 3:362—363 
refinement 1:101-102, 2:38 
thermal diffusion 1:261-262 
texture maps 1:539 
textures, geometric modeling 1:491-492 
TGS Amira development environment 1:545 
theorem of expended power 2:230~231 
theoretical filters 3:272-274 
thermal... 
diffusion 1:238-240, 1:252~254, 1:257-262 
loadings 2:301 
property upscaling 2:516 
thermo-elastic consolidation 2:588—589 
thermo-hydromechanics 2:592—599 
thermochemistry 3:506-507 
thermodiffusive instabilities 3:503 
thermodynamics 
crystal plasticity 2:272-273 
elasticity balance equations 2:11-13 
equations 2:11-13, 2:230-231, 3:332-334 
first law of 2:12 
forces 2:456 
second law of 2:12-13 
thermomechanical coupling 2:484—485 
thermomechanical loadings 2:293-294, 2:321-326 
thermoplastic strain localization 2:503, 2:504 
thickened wrinkled flame regime 3:518 
thickness 
integration 2:83, 2:98—102 
locking 2:80, 2:97 
plates and shells 1:200-201 
thin domains 1:199-229 
finite element methods 1:219-229 
thin elements 1:228-229 
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thin plate deflection 1:34-36 

thin sheets 2:495—496 

thin wrinkled flame regime 3:517-518 
thin-walled structures 


delamination-buckling 2:368 

dimensional reduction 2:70-107 

director 2:113-116 

finite element formulation 2:59, 2:104-128 
mathematical modeling 2:61-68 
mechanical foundations 2:63-68 

models 2:59—103 

see also plates; shells 


three-dimensional 


aerodynamic shape optimization 3:383-386 
backward extrusion 2:500-S02 
blood flow 3:533-540 
computer graphics 1:527—-528 
consolidation 2:562 
constitutive models 2:608-609 
contact discretizations 2:212-213 
continuum 2:72-76, 2:83 
design 3:383-386 
discretization 2:198, 2:212~213 
elastic models 2:627-629 
error estimates 1:84-85 
expansions 1:212-213 
finite element models 2:623-624 
flow 3:78-81 
fluid dynamics 3:201-205 
forced homogeneous turbulence 3:235-240 
inviscid flows 3:134~137 
moving boundaries 1:517-521 
Stokes equations 1:266 
thermal diffusion 1:261—262 
three-field methods 2:23, 3:461-464 
three-point bending beams 2:364-365, 2:366-367, 2:369 
thresholding 
adaptive mesh-refining 1:100 
adaptive wavelets 1:172-174, 1:178 
damage 2:482 
visualization algorithms 1:539 
through crack topology 2:388 
through-the-thickness integration 2:94 
through-the-thickness stretching 2:80 
time 
advancement schemes 3:283 
dataset attributes 1:530 
dependent 
acoustic waves 2:713 
boundary integral equations 1:6, 1:703~719 
compressible Euler equations 3:101, 3:103-104, 3:105 
convection—diffusion equations 1:43-44 
viscoelastic fluid flows 3:481, 3:488-—490 
derivatives 
arbitrary Lagrangian—Eulerian 1:418-419 
constitutive equations 2:650-—651 
moving volumes 1:418-419 
viscoelastic direct boundary elements 2:752-753 
volume integrals 1:418-419 
discontinuous Galerkin methods 2:173, 2:176-177, 2:184 
discontinuous space-time finite element equations 2:171-173 
discretization 
boundary integral equations 1:715 
dynamic multilevel methods 3:231~240 
fourth-order hyperbolic equations 1:35-36 
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time (continued) 

incompressible viscous flows 3:166—170, 3:175 
nonstationary Navier-Stokes equations 3:166-170 
nonstationary viscous flows 3:175 
Reynolds averaged turbulence 3:421 
shallow water equations 3:242-246 
strong stability preserving Runge—Kutta 3:106-107 
viscoelastic fluid flow 3:494 

domains 1:703-—719, 2:713, 2:751-758 

finite element methods 2:176, 2:178, 2:184, 2:186 

harmonic waves 1:707, 1:724-725, 2:697-698, 3:8-9 


trace operators 1:351 

traction 
acoustic field equations 2:697 
boundary conditions 2:442—443, 2:447, 2:449 
boundary integral equations 1:345-346 
composite laminates 2:444-—445 
crystal plasticity 2:286-287 
notched 2:364—365 
plates 1:207 
shakedown 2:293 
three-point bending beams 2:364~365 


trial... 
displacements 2:172-173 
and error methods 2:219, 2:642 
functions 1:119 
stress rate 2:236-237 
triangles 
algebraic expansions 1:145-146 
splitting 1:263-264 
to rectangle transformations 3:64-65 
triangular elements 
Hermitian 1:77-78 
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prediction methods 3:320-322 

Reynolds averaged Numerical Simulations 3:270 
shallow water equations 3:246—247 

ship hydrodynamics 3:593-594 

space-time averages 3:199-201 

spatial behaviors 3:225—228 

spectral/hp element method 3:72-74, 3:85-86 
time behaviors 3:225—228 

time integration 3:247-—248 

transport 3:315 

variational multiscale method 3:44~49 


increments 1;316 turnkey applications 1:545-546 


TVB see total variation bounded 
TVBM property 3:107-108, 3:109 
TVBM stability 3:108 


vectors 2:10, 2:726 Lagrangian 1:77 


integration transfer functions 1:532 
discontinuous deformations 1:332~333 transfer operators 2:489-491 


dissipation 2:179, 2:186-187, 3:52-53 transfinite mapping 1:421 

dynamic contact 2:220 transformation methods 2:425 i TVD jati iminishi 
ne r Stokes equations 1:262—266 see total variational diminishing 

explicit 1:51, 2:183-184, 2:188-189, 3:252~254, 3:256-259 tanaformis 21564 coma ie sit oa ont TVNI schemes 1:451 

finite volume schemes 1:464-—465 transient analysis triangulation two-dimensional 


macro elements 1:78 
nonconforming 1:105-106 
refinement 1:101, 2:38 


gravity terms 3:247-248 
incompressible flows 3:67—-69 
linear structural dynamics 2:181-184 
multibody contact 1:331-333 
nonlinear aeroelasticity 3:464-465, 3:471-473 
nonlinear structural dynamics 2:184—187 
shallow water equations 3:244—246, 3:252~259 
structural dynamics 2:175-181, 2:188-189 
linear 2:181-184 
nonlinear 2:184—187 
practical considerations 2:187—189 
scale bounds 3:309-310 
scale equations 3:308 
scale separation 3:229-230 
shape-functions 2:754, 2:755-756 
shift invariance 3:275 
splitting 3:67-69 
stepping 
aerodynamics 3:365-379 
boundary integral equations 1:704—705, 1:714-719 
control 1:699, 1:701 
discontinuous deformations 1:332-333, 1:334 
size 2:188-189 
spectral schemes 3:377—379 
stiff initial value problems 1:683-686, 1:687 
turbulent flows 3:225-228 
tissue 2:605-629 
Toeplitz structures 1:555, 1:717 
toolkits 1:543, 1:544 
topology 
arbitrary crack growth 2:386~-388 
extraction 1:539 
material responses 2:425—426 
shell structure 2:85~86 
validity 1:489-490 
vector fields 1:536-537 
torsion 2:609—613, 2:614, 3:475—-476 
total dissipation 3:195~196 
total variation bounded (TVB) Runge—Kutta methods 1:464 
total variational diminishing schemes (TVD) 
computational aerodynamics 3:329 
conservation laws 1:452—453 
critical point accuracy 1:453 
Monotone upstream-centered scheme 1:452-453 
multidimensional finite volumes 1:455~-456 
one-dimensional finite volumes 1:450—453 
Runge-Kutta methods 1:464, 3:106~-107 
shock capturing 3:348 


buckling 2:160-164 
dynamic processes 1:428 
elastodynamics 2:759, 2:767-768 
Maxwell equations 1:735 
piezoelectricity 2:762, 2:763 
structural dynamics 2:169-189 
transition continuum/discontinuum 1:329-331 
transition elements 2:37~38 
transmission 1:355~356, 1:182, 3:468-471 
transom stern flow 3:591 
transonic 
business jets 3:398-399 
flight 3:390 
flows 
adjoint method 3:382-383, 3:390 
computational aerodynamics 3;326-327 
moving boundaries meshing 1:519—520 
nonlinear aercelasticity 3:460 
potential 3:337-339, 3:410-411 
small-disturbance equation 3:339—342 
transparency methods 1:301-302 
transpiration condition 3:440—441 
transport 
dominant 3:164-~-166 
element method 3:149-150 
equations 3:310, 3:505-506 
molecular 3:505-506 
neutron 3:91, 3:92-96 
Reynolds 1:419, 3:320 
second moment 3:311-318 
shear stress 3:308—309, 3:311, 3:320 
transverse 
cracking 2:438 
loads 2:438 
normal stiffness 2:126 
normal strains 2:96 
shear... 
locking 2:121-123 
strains 2:82, 2:97 
stress 2:97 
trees 
codes 3:143-144 
construction 1:498, 1:500 
objects 1:483--485 
structures 1:189—190 
Trefftz’s condition 2:143—144 
Tresca friction 1:433 


curved domains 1:109 
Delaunay-type mesh generation 1:501 
finite element spaces 1:79—80 
geometric modeling 1:478 
mesh refining 1:98, 1:100-103 
numerical integration 1:107-109 
panel clustering 1:599 
triaxial compression 2:555--556 
tribology 2:205 
tridiagonal matrices 1:552 
trigonometric expansion 1:141-142 
trimmed surfaces 1:480, 1:481 
triple decomposition 3:293 
truncation 1:32—33, 1:41, 1:48, 1:611, 3:139-140 
trunk space 1:121-124 
tubes 1:536, 1:540-541, 2:323-326 
tunnels 
autogenous shrinkage 2:514, 2:516-529 
concrete mechanics 2:514, 2:525-529 
concrete under fire 2:598-599 
high-speed trains 3:570-571 
linings 2:514, 2:516~529 
turbine flows 3:294 
turbomachinery 3:377-379 
turbulence 
adaptive computation 3:199~201 
aerodynamic flow mechanics 3:418-423, 3:446-447 
closure 
computational fluid dynamics 3:301-~-322 
constants 3:304—305 
Reynolds averaged 3:318-322 
scalar variable 3:303~-311 
second moment transport 3:311-318 
compressible flows 3:271—279, 3:280-281, 3:290—291 
direct numerical simulations 3:2, 3:269--270, 3:279—283, 3:293--296 
dynamic multilevel methods 3:207—-264 
flames 3:500-501, 3:514—523 
homogeneous isotropic 3:209-240 
incompressible flows 3:72-74, 3:85-86, 3:271-279, 3:284--290 
industrial aerodynamics 3:418-423, 3:446-447 
kinetic energy 3:446-447 
large eddy simulations 3:2, 3:269, 3:270-274, 3:279-285, 
3:293-296 
monotone schemes 3:419-420 
multiscale method 3:40-55 
Newtonian fluids 3:269-296 
pantograph shrouds 3:5~7 


flows 3:77-78, 3:130-~-134 
interpolations 2:213--214 
meshes 1:512-513 
moving boundaries 1:517-319 
node-to-segment 2:211-—212 
turbulence 3:225-226 

two-field finite element methods 2:21-22 


two-field time-discontinuous Galerkin methods 2:176-177 


two-grid iterations 1:581-584 
two-layer boundary layer model 3:292 
two-layer k-e model 3:306-307 


two-level decomposition 3:228-232 


two-point boundary problems 1:9-12 
two-point velocity 3:210—211, 3:212 
two-sided Lanczos method 1:573-574 
two-sided preconditioning 1:561 
two-surface yield conditions 2:308-309 
@-schemes 1:21-23, 1:26, 3:489-490, 3:491 


UCAV see unmanned combat air vehicles 
UCM see upper-convected Maxwell (UCM) 
UGNIRGN stabilization parameters 3:549—-550 
UI see user interfaces 
unbounded. . 
domains 2:702-703 
half-space wave propagation 2:758 
inviscid flows 3:131-137 
uncertainty problem 3:284—285 
unconjugated test functions 2:709-710 
under-resolved simulations 3:86-87 
umiaxial tension bars 2:344—345, 2:346-349, 2:351 
unidirectional composite laminates 2:445 
uniform... 
bending 2:293 
hierarchical matrices 1:614 
small strain constitutive equations 2:647--649 
test boundary loadings 2:416 
uniqueness 1:30, 2:640-641 
unit meshing 1:504—510 
unit volume 1:507~—510 
unity partitions see partitions of unity 
universal unfoldings 1:669-670 
unknown ordering 1:563-564 
unmanned combat air vehicles (UCAV) 3:430-431 
unsaturated materials 2:553-554 
unsplit stress-updating 1:429-431 
unstable flames 3:500~501, 3:503-504 
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unstable minima 2:645-646 
unsteady advective-diffusive equations 3:36~37 


unsteady Reynolds averaged Navier-Stokes equations 3:321 


unstructured grids 3:63-67 


unstructured meshes 1:457-460, 1:463-464, 3:359-360, 3:396 


updating 
crystal plasticity 2:274-280 
dropping 2:279 
elastic deformation maps 2:275-276 
iteration 1:657 
scaling 2:278-279 
shear 3:555, 3:571, 3:572 
slip 2:278-280 
stress 1:428, 1:429-431, 2:274-280 
thin-walled structures 2:127—128 
upper... 
bounds 1:97-98 
convected Maxwell (UCM) 3:481, 3:486, 3:489~490 
Hessenberg matrices 1:552, 1:561 
variation bounds 2:414 
upscaling 2:515-516, 2:521--522, 2:530-531, 2:533-538 
upwinding 
convection—diffusion equations 1:36-37 
differencing 1:36-37, 1:50, 3:339-340, 3:342-343 
finite difference 1:36-37, 1:50 
numerical flux 3:92, 3:95 
shock capturing 3:348-35] 
spectral techniques 3:483 
triangle schemes 1:463~464 
user interfaces (UI) 1:475-476 
Uzawa algorithms 
adaptive wavelets 1:185~186 
contact mechanics 2:217-218 
Signorini-type interfaces 1:402-403 


V-cycles 1:585~586, 3:233-234 
v?-f model 3:317-318 
vacuum-mixed cement pastes 2:519 
validation 
computational flow mechanics 3:428 
hierarchical modeling 2:40-42 
inelastic material constitutive equations 2:639-640 
nonlinear aeroelasticity 3:474-477 
van Albada limiter 1:457 
Van der Pol’s equation 1:685 
van Leer flux vector splitting 1:470 
van Leer limiter 1:457 
vanishing efficiency 3:408-410 
vanishing viscosity 1:441 
vapor pressure distribution 2:598 
variable. . . 
distribution tails 2:667 
loading domains 2:296-—297 
preconditioning 1:560 
rank matrices 1:613-614 
sensitivity methods 2:667-668 
variational 
alternating Schwarz method 1:619-620 
equations 3:16-17 
formulations 
arbitrary Lagrangian—Eulerian 1:421 
boundary elements methods 1:347-358 
boundary integral equations 1:347~358 
contact mechanics 2:199-200 
elastic-plastic crystals 2:280~283 


incompressible viscous flows 3:156-157 
Maxwell equations 1:725-727 
microgeometrical manufacturing 2:414 
symmetric Galerkin BEM 2:727-729 
multiscale methods 3:11-18, 3:27-32, 3:38-39, 3:44-49 
principles 
elasticity 1:242-244 
mixed finite element methods 1:238 
partial differential equations 1:293 
Stokes equations 1:241 
thermal diffusion 1:239-240, 1:257-262 
space-time boundary integrals 1:709-711 
stress updating 2:276-277 
vascular solid mechanics 2:616~618 
vaults 2:598-599 
vdv ordering 1:563 
vectors 
base 2:8-9, 2:65, 2:91 
differencing 2:80, 2:81 
director 2:118 
displacement plots 1:534 
eigenvectors 1:537, 1:552~553 
fields 1:530, 1:533-537 
flux splitting 1:469—470, 3:329 
linear algebraic solvers 1:552 
load 1:328 
material heat flux 2:12 
matrix-vector computations 1:600—604, 1:607, 1:611, 2:15-16, 
3:567-568 
mean value 2:646 
Ritz 1:553, 1:572 
rotation 2:118-119 
Steger—Warming flux splitting 1:469-470 
traction 2310, 2:726 
van Leer flux splitting 1:470 
vein blood flow 3:530 
velocity 
acoustic field equations 2:697 
arbitrary Lagrangian—Eulerian 1:416-417, 1:422-423 
blood flow 3:528~540 
correction projections 3:67—68 
homogeneous turbulence 3:239-240 
Navier-Stokes equations 3:210-213 
potential 2:697 
Pressure interpolation 3:484 
shallow water equations 3:241-242, 3:255-259 
Stokes equations 1:264-266 
turbulence closure 3:321-322 
vortex methods 3:141~145 
VEM see vortex element methods 
Verfiirth’s trick 1:270, 1:272-274 
verification 
concrete mechanics 2:516, 2:532~533 
hierarchical modeling 2:40~42 
inelastic material constitutive equations 2:639 
vertex... 
centered finite volume methods 1:442, 1:467 
modes 1:122, 1:123 
placement 1:541 
vertical pressure amplitude 2:592, 2:593 
vibration 
eigen-modes 1:200-201, 1:206-207 
elastodynamics 2:759 
vibratory response 2:683—692 
VIC see vortex in cell 


Subject Index 797 


re A 


Viogt fields 2:409, 2:411-412 
virtual displacements 2:68 
virtual variations 2:199—200 
virtual work 
degenerating solid elements 2:82-83 
direct approach 2:79 
discretization 2:111 
relations 2:523—524 
thin-walled structures 2:67—68, 2:74, 2:79, 2:82-83, 2:111 
three-dimensional continuum 2:74 
viscoelasticity 
direct boundary elements 2:751-758 
dynamic analysis 2:751-—758 
` fluid flow analysis 3:3 
Brownian configuration fields 3:491, 3:493-494 
deformation fields 3:491, 3:492-493 
flow past cylinders 3:487—488, 3:494—496 
integral methods 3:490-491, 3:492—493 
mixed finite element methods 3:481—496 
numerical methods 3:494 
steady flows 3:481, 3:482-488 
stochastic constitutive equations 3:490-491, 3:493-494 
time dependent flows 3:481, 3:488-490 
viscoplasticity 
deformations 
augmented Lagrangian 2:260-263 
closest-point-projections 2:252—255-32 
exponential retum-mapping 2:248~250 
integration 2:227-264 
return-mapping 2:244-250 
dissipation 2:238-239 
viscosity 
aerodynamic shape optimization 3:390 
buckling analysis 2:163 
compressible plasma flows 3:84 
conservation laws 1:441, 1:466 
continuum damage models 2:355, 2:369 
damping 3:305—306 
incompressible flows 3:155~179 
modified Lax~-Wendroff flux 3:415 
ship hydrodynamics 3:580, 3:592 
see also incompressible flows 
viscous. .. 
discretization 3:364—365 
flows 
aerodynamics 3:330-331 
ship hydrodynamics 3:585-587 
vortex methods 3:130, 3:137-—139 
overstress algorithms 2:278 
pitching cycles 3:380 
tensors 3:505 
visibility criterion 1:300-303 
visualization 1:5, 1:525-548 
algorithms 4:531-541 
data forms 1:528-531 
graphics 1:527-528 
interfacing 1:546-548 
large data methods 1:542-543 
taxonomy 1:543—-546 
volume rendering 1:541-542 
VOF see volume of fluid 
volume 
domains 1:499 
of fluid (VOF) 3:583 
fractions 2:418-421, 2425-426, 2:514-515, 2:520-521 


integrals 1:418-419 
potentials 1:704 
rendering 1:541-542 
residuals 1:87—88 
volumetric. .. 
behavior, geomechanics 2:557, 2:558 
flow rates 3:533, 3:534, 3:539~-540 
locking 1:298-300 
von Mises effective stress distribution 2:475—476 
von Mises yield criterion 2:234—235 
Voronoi cells 2:425, 2:440 
vortex, .. 
blobs 3:131, 3:132 
bursting 3:468~469 
in cell (VIC) method 3:130, 3:144—145 
drag 3:443 
element methods (VEM) 3:131-134, 3:145-149 
filaments 3:134-135 
methods 3:2, 3:129-152 
bluff-body fiows 3:131-137 
convection—diffusion equations 3:149 
efficient velocity evaluations 3:141~145 
flows 3:145-149 
incompressible flows 3:131-137 
inviscid flows 3:131-137 
particle redistribution 3:140~141 
truncation errors 3:139-140 
viscous flows 3:137-139 
particles 3:134, 3:135-137, 3:149-150 
sheets 3:131, 3:133-134, 3:151 
vorticity 
correlation tensors 3:210—211 
curl 3:131, 3:132-133 
error indicators 3:174~175 
military aircraft 3:429-431 
visualization algorithms 1:536 
voxel representation 1:476-477, 1:484—485, 1:532, 1:533 
VTK toolkit 1:544 


W-cycles 1:585-586, 3:373 
wake 3:72-74, 3:598-599 
wall... 
boundary conditions 3:291-292, 3:296, 3:361 
bounded flows 3:426 
echo 3:316-317, 3:320-321 
functions 3:305~306, 3:307—308 
impedance conditions 2:689 
normal displacements 2:685, 2:687, 2:690, 2:691 
shear stress 3537-538 
temperatures 3:453 
warped tensor products 1:145—-146 
water 
falling solids 3:604—607 
flow 2:584—586 
inside hulls 3:581 
mills 3:603 
pore 2:561-562 
shallow water equations 3:240-~259 
wave... 
contact 3:604—605 
drag 3:443 
equation 
derivation 1:28-29 
difference approximation 1:30-34 
discontinuous Galerkin methods 3:94 
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wave... (continued) 
domain of dependence 1:29-30 
finite difference methods 1:28-34 
Helmholtz equation 2:701-702 
Maxwell equations 1:724—725 
space-time boundary integrals 1:706-707, 1:711-713 
time-harmonic waves 2:697-698 
guides 1:735 
patterns 3:580, 3:597-598 
propagation 2:757—758, 3:531-533 
resistance 3:579, 3:580 
ship contacts 3:601, 3:602-604 
transmission 3:6-7 
waveform advection 3:62 
wavelets 
adaptive techniques 1:157-195 
elasticity 2:731 
Galerkin schemes 1:178-180 
Haar 1;160-162, 1:481 
linear operators 1:168~169 
wavenumbers 
accelerated multifrequency methods 2:710-711 
eddy viscosity 3:47-49 
element Green’s function 3:26 
Helmholtz equation 2:699-701, 3:26 
time-harmonic waves 2:697-698 
weak form 
boundary. .. 
elements methods 1:347—358 
integral equations 1:347-358 
values 1:446-447, 2:16-17 
conservation laws 1:440-441 
contact mechanics 2:200-202 
continuity equation 1:726 
Euler equation solutions 3:413 
Godunov finite volume discretizations 1:444 
linear elliptic boundary values 1:74-75 
matrix compressions 1:177~178 
partial differential equations 1:293-294 
weakly singular kernel integral equations 1:594-595 
wear 2:206 p 
weighted. .. 
averages 1:30-31 
essentially nonoscillatory (WENO) schemes 1:455, 3:355-356 
least squares 2:643 


residuals 2:176, 2:177, 2:178 
Sobolev spaces 1:63 
weighting functions 
exterior acoustics problems 2:708-709 
moving least squares 1:290--291 
smooth particle hydrodynamics 1:281-283, 1:284 
structural dynamics 2:172-173 
Weissenburg numbers 3:481, 3:485-488, 3:489-490, 3:494-496 
well-posedness 
adaptive wavelets 1:168-169, 1:184-185 
discontinuous Galerkin methods 3:97 
finite difference methods 1:7-9 
wavelets and linear operators 1:168-169 
WENO see weighted essentially nonoscillatory 
Whitney space 1:730 
Wigley hulls 3:594-597 
wind loads 2:533-534, 2:538 
wind tunnels 3:293, 3:429-430 
window functions 1:281-283, 1:284 
winged-cdge data structures 1:481, 1:483 
wings 3:78-81, 3:392-397 
wire-basket-based Schur-complement preconditioning 1:636 
World War II 3:408-409 
wrapping lines with tubes 1:536, 1:540-541 


X-38 crew rescue vehicles 3:434-435 
XFEM see extended finite element method 


yield 

conditions 2:227—228, 2:233-235, 2:308-309 

functions 2:233, 2:310, 2:465 

line patterns 1:422 

shakedown 2:308-309 

Strains 2:481 

surfaces 2:227—228, 2:234-235, 2:481 
Young’s modulus 2:559-560 


Zarka’s method 2:319—320 
zero-dimensional tools 3:513—-514 
zero-frequency mode 2:686, 2:691 
zeroth-order approximations 2:449~451 
Zienkiewicz elements 1:78 


or 
wi 


